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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Thomas, Winston J. 

Drayna, Dennis T. 
Feder, John N. 
Gnirke, Andreas 
Ruddy, David 
Tsuchihashi , 2enta 
Wolff, Roger K, 

(ii) TITLE OF INVENTION: Hereditary Hemochromatosis Gene 

(iii) NUMBER OF SEQUENCES: 44 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Townsend and Townsend and Crew LLP 

(B) STREET; Two Embarcadero Center, Eighth Floor 

(C) CITY: San Francisco 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94111-3834 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentin Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/652,265 

(B) FILING DATE: 23 -MAY- 1996 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Smith, William M. 

(B) REGISTRATION NUMBER: 30,223 

(C) REFERENCE/DOCKET NUMBER: 17957-000500 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (415) 576-0200 

(B) TELEFM: (415) 576-0300 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10825 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: join (361 .. 436 , 3762.. 4025, 4235.. 4510, 5606.. 5881, 

6040.. 6153, 7107.. 7147) 
(D) OTHER INFORMATION: /product^ "Hereditary Hemochromatosis 
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(HH) protein" 

/note= "Normal or wild-type (unaffected) 
Hereditary Hemochromatosis (HH) gene 
allele" 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 140.. 7319 

(D) OTHER INFORMATION: /note= "start and stop positions for 

normal or wild-type (unaffected) allele 
CDNA (SEQ ID N0:9) " 

(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 3852.. 3891 

(D) OTHER INFORMATION: /note= "start and stop positions for 

normal or wild-type (iinaf f ected) genomic 
sequence surrounding variant for 24d2 (C) 
allele (SEQ ID N0:41)" 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 5507.. 6023 

(D) OTHER INFORMATION: /note= "start and stop positions for 

normal or wild-type (unaffected) genomic 
sequence surrounding variant for 24dl (G) 
allele (SEQ ID NO:20) " 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (3872 , "c") 

(D) OTHER INFORMATION: /phenotype= "normal or wild-type 

(unaffected) " 
/label= 24d2 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (3878 , "a") 

(D) OTHER INFORMATION: /phenotype= "normal or wild- type 

(unaffected) " 
/labels 24d7 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (5834 , "g") 

(D) OTHER INFORMATION: /phenotype- "normal or wild- type 

(unaffected) " 
/label- 24dl 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

TCTAAGGTTG AGATAAAATT TTTAAATGTA TGATTGAATT TTGAAAATCA TAAATATTTA 60 

AATATCTAAA GTTCAGATCA GAACATTGCG AAGCTACTTT CCCCAATCAA CAACACCCCT 120 

TCAGGATTTA AAAACCAAGG GGGACACTGG ATCACCTAGT GTTTCACAAG CAGGTACCTT 180 

CTGCTGTAGG AGAGAGAGAA CTAAAGTTCT GAAAGACCTG TTGCTTTTCA CCAGGAAGTT 240 

TTACTGGGCA TCTCCTGAGC CTAGGCAATA GCTGTAGGGT GACTTCTGGA GCCATCCCCG 300 
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TTTCCCCGCC CCCCAAAAGA AGCGGAGATT TAACGGGGAC GTGCGGCCAG AGCTGGGGAA 360 

ATG GGC CCG CGA GCC AGG CCG GCG CTT CTC CTC CTG ATG CTT TTG CAG 408 
Met Gly Pro Arg Ala Arg Pro Ala Leu Leu Leu Leu Met Leu Leu Gin 
15 10 15 



ACC GCG GTC CTG CAG GGG CGC TTG CTG C GTGAGTCCGA GGGCTGCGGG 456 
Thr Ala Val Leu Gin Gly Arg Leu Leu 





20 




25 








CGAACTAGGG 


GCGCGGCGGG 


GGTGGAAAAA 


TCGAAACTAG 


CTTTTTCTTT 


GCGCTTGGGA 


516 


GTTTGCTAAC 


TTTGGAGGAC 


CTGCTCAACC 


CTATCCGCAA 


GCCCCTCTCC 


CTACTTTCTG 


576 


CGTCCAGACC 


CCGTGAGGGA 


GTGCCTACCA 


CTGAACTGCA 


GATAGGGGTC 


CCTCGCCCCA 


636 


GGACCTGCCC 


CCTCCCCCGG 


CTGTCCCGGC 


TCTGCGGAGT 


GACTTTTGGA 


ACCGCCCACT 


696 


CCCTTCCCCC 


AACTAGAATG 


CTTTTAAATA 


AATCTCGTAG 


TTCCTCACTT 


GAGCTGAGCT 


756 


AAGCCTGGGG 


CTCCTTGAAC 


CTGGAACTCG 


GGTTTATTTC 


CAATGTCAGC 


TGTGCAGTTT 


816 


TTTCCCCAGT 


CATCTCCAAA 


CAGGAAGTTC 


TTCCCTGAGT 


GCTTGCCGAG 


AAGGCTGAGC 


876 


AAACCCACAG 


CAGGATCCGC 


ACGGGGTTTC 


CACCTCAGAA 


CGAATGCGTT 


GGGCGGTGGG 


936 


GGCGCGAAAG 


AGTGGCGTTG 


GGGATCTGAA 


TTCTTCACCA 


TTCCACCCAC 


TTTTGGTGAG 


996 


ACCTGGGGTG 


GAGGTCTCTA 


GGGTGGGAGG 


CTCCTGAGAG 


AGGCCTACCT 


CGGGCCTTTC 


1056 


CCCACTCTTG 


GCAATTGTTC 


TTTTGCCTGG 


AAAATT7UVGT 


ATATGTTAGT 


TTTGAACGTT 


1116 


TGAACTGAAC 


AATTCTCTTT 


TCGGCTAGGC 


TTTATTGATT 


TGCAATGTGC 


TGTGTAATTA 


1176 


AGAGGCCTCT 


CTACAAAGTA 


CTGATAATGA 


ACATGTAAGC 


AATGCACTCA 


CTTCTAAGTT 


1236 


ACATTCATAT 


CTGATCTTAT 


TTGATTTTCA 


CTAGGCATAG 


GGAGGTAGGA 


GCTAATAATA 


1296 


CGTTTATTTT 


ACTAGAAGTT 


AACTGGAATT 


CAGATTATAT 


AACTCTTTTC 


AGGTTACAAA 


1356 


GAACATAAAT 


AATCTGGTTT 


TCTGATGTTA 


TTTCAAGTAC 


TACAGCTGCT 


TCTAATCTTA 


1416 


GTTGACAGTG 


ATTTTGCCCT 


GTAGTGTAGC 


ACAGTGTTCT 


GTGGGTCACA 


CGCCGGCCTC 


1476 


AGCACAGCAC 


TTTGAGTTTT 


GGTACTACGT 


GTATCCACAT 


TTTACACATG 


ACAAGAATGA 


1536 


GGCATGGCAC 


GGCCTGCTTC 


CTGGCAAATT 


TATTCAATGG 


TACACTGGGC 


TTTGGTGGCA 


1596 


GAGCTCATGT 


CTCCACTTCA 


TAGCTATGAT 


TCTTAAACAT 


CACACTGCAT 


TAGAGGTTGA 


1656 


ATAATAA/^T 


TTCATGTTGA 


GCAGAAATAT 


TCATTGTTTA 


CAAGTGTAAA 


TGAGTCCCAG 


1716 


CCATGTGTTG 


CACTGTTCAA 


GCCCCAAGGG 


AGAGAGCAGG 


GAAACAAGTC 


TTTACCCTTT 


1776 


GATATTTTGC 


ATTCTAGTGG 


GAGAGATGAC 


AATAAGCAAA 


TGAGCAGAAA 


GATATACAAC 


1836 


ATCAGGAAAT 


CATGGGTGTT 


GTGAGAAGCA 


GAGAAGTCAG 


GGCAAGTCAC 


TCTGGGGCTG 


1896 


ACACTTGAGC 


AGAGACATGA 


AGGAAATAAG 


AATGATATTG 


ACTGGGAGCA 


GTATTTCCCA 


1956 


GGCAAACTGA 


GTGGGCCTGG 


CAAGTTGGAT 


TAAAAAGCGG 


GTTTTCTCAG 


CACTACTCAT 


2016 
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GTGTGTGTGT 


GTGGGGGGGG 


GGGGCGGCGT 


GGGGGTGGGA 


AGGGGGACTA 


CCATCTGCAT 


2076 


GTAGGATGTC 


TAGCAGTATC 


CTGTCCTCCC 


TACTCACTAG 


GTGCTAGGAG 


CACTCCCCCA 


2136 


GTCTTGACAA 


CCAAAAATGT 


CTCTAAACTT 


TGCCACATGT 


CACCTAGTAG 


ACAAACTCCT 


2196 


GGTTAAGAAG 


CTCGGGTTGA 


AAAAAATAAA 


CAAGTAGTGC 


TGGGGAGTAG 


AGGCCAAGAA 


2256 


GTAGGTAATG 


GGCTCAGAAG 


AGGAGCCACA 


AACAAGGTTG 


TGCAGGCGCC 


TGTAGGCTGT 


2316 


GGTGTGAATT 


CTAGCCAAGG 


AGTAACAGTG 


ATCTGTCACA 


GGCTTTTAAA 


AGATTGCTCT 


2376 


GGCTGCTATG 


TGGAAAGCAG 


AATGAAGGGA 


GCAACAGTAA 


AAGCAGGGAG 


CCCAGCCAGG 


2436 


AAGCTGTTAC 


ACAGTCCAGG 


CAAGAGGTAG 


TGGAGTGGGC 


TGGGTGGGAA 


CAGAAAAGGG 


2496 


AGTGACAAAC 


CATTGTCTCC 


TGAATATATT 


CTGAAGGAAG 


TTGCTGAAGG 


ATTCTATGTT 


2556 


GTGTGAGAGA 


AAGAGAAGAA 


TTGGCTGGGT 


GTAGTAGCTC 


ATGCCAAGGA 


GGAGGCCAAG 


2616 


GAGAGCAGAT 


TCCTGAGCTC 


AGGAGTTCAA 


GACCAGCCTG 


GGCAACACAG 


CAAAACCCCT 


2676 


TCTCTACAAA 


AAATACAAAA 


ATTAGCTGGG 


TGTGGTGGCA 


TGCACCTGTG 


ATCCTAGCTA 


2736 


CTCGGGAGGC 


TGAGGTGGAG 


GGTATTGCTT 


GAGCCCAGGA 


AGTTGAGGCT 


GCAGTGAGCC 


2796 


ATGACTGTGC 


CACTGTACTT 


CAGCCTAGGT 


GACAGAGCAA 


GACCCTGTCT 


CCCCTGACCC 


2856 


CCTGAAAAAG 


AGAAGAGTTA 


AAGTTGACTT 


TGTTCTTTAT 


TTTAATTTTA 


TTGGCCTGAG 


2916 


CAGTGGGGTA 


ATTGGCAATG 


CCATTTCTGA 


GATGGTGAAG 


GCAGAGGAAA 


GAGCAGTTTG 


2976 


GGGTAAATCA 


AGGATCTGCA 


TTTGGGACAT 


GTTAAGTTTG 


AGATTCCAGT 


CAGGCTTCCA 


3036 


AGTGGTGAGG 


CCACATAGGC 


AGTTCAGTGT 


AAGAATTCAG 


GACCAAGGCT 


GGGCACGGTG 


3096 


GCTCACTTCT 


GTAATCCCAG 


CACTTTGGTG 


GCTGAGGCAG 


GTAGATCATT 


TGAGGTCAGG 


3156 


AGTTTGAGAC 


AAGCTTGGCC 


AACATGGTGA 


AACCCCATGT 


CTACTAAAAA 


TACAAAAATT 


3216 


AGCCTGGTGT 


GGTGGCGCAC 


GCCTATAGTC 


CCAGGTTTTC 


AGGAGGCTTA 


GGTAGGAGAA 


3276 


TCCCTTGAAC 


CCAGGAGGTG 


CAGGTTGCAG 


TGAGCTGAGA 


TTGTGCCACT 


GCACTCCAGC 


3336 


CTGGGTGATA 


GAGTGAGACT 


CTGTCTCAAA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAAACTGA 


3396 


AGGAATTATT 


CCTCAGGATT 


TGGGTCTAAT 


TTCSCCCTGAG 


CACCAACTCC 


TGAGTTCAAC 


3456 


TACCATGGCT 


AGACACACCT 


TAACATTTTC 


TAGAATCCAC 


CAGCTTTAGT 


GGAGTCTGTC 


3516 


TAATCATGAG 


TATTGGAATA 


GGATCTGGGG 


GCAGTGAGGG 


GGTGGCAGCC 


ACGTGTGGCA 


3576 


GAGAAAAGCA 


CACAAGGAAA 


GAGCACCCAG 


GACTGTCATA 


TGGAAGAAAG 


ACAGGACTGC 


3636 


AACTCACCCT 


TCACAAAATG 


AGGACCAGAC 


ACAGCTGATG 


GTATGAGTTG 


ATGCAGGTGT 


3696 


GTGGAGCCTC 


AACATCCTGC 


TCCCCTCCTA 


CTACACATGG 


TTAAGGCCTG 


TTGCTCTGTC 


3756 



TCCAG GT TCA CAC TCT CTG CAC TAG CTC TTC ATG GGT GCC TCA GAG 3802 
Arg Ser His Ser Leu His Tyr Leu Phe Met Gly Ala Ser Glu 
30 35 
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CAG GAC CTT GGT CTT TCC TTG TTT GAA GCT TTG GGC TAG GTG GAT GAC 3850 
Gin Asp Leu Gly Leu Ser Leu Phe Glu Ala Leu Gly Tyr Val Asp Asp 
40 45 50 55 

CAG CTG TTC GTG TTC TAT GAT CAT GAG AGT CGC CGT GTG GAG CCC CGA 3898 
Gin Leu Phe Val Phe Tyr Asp His Glu Ser Arg Arg Val Glu Pro Arg 
60 65 70 

ACT CCA TGG GTT TCC AGT AGA ATT TCA AGC CAG ATG TGG CTG CAG CTG 3946 
Thr Pro Trp Val Ser Ser Arg lie Ser Ser Gin Met Trp Leu Gin Leu 
75 80 85 

AGT CAG AGT CTG AAA GGG TGG GAT CAC ATG TTC ACT GTT GAC TTC TGG 3994 
Ser Gin Ser Leu Lys Gly Trp Asp His Met Phe Thr Val Asp Phe Trp 
90 95 100 

ACT ATT ATG GAA AAT CAC AAC CAC AGC AAG G GTATGTGGAG AGGGGGCCTC 4045 
Thr lie Met Glu Asn His Asn His Ser Lys 
105 110 

ACCTTCCTGA GGTTGTCAGA GCTTTTCATC TTTTCATGCA TCTTGAAGGA AACAGCTGGA 4105 

AGTCTGAGGT CTTGTGGGAG CAGGGAAGAG GGAAGGAATT TGCTTCCTGA GATCATTTGG 4165 

TCCTTGGGGA TGGTGGAAAT AGGGACCTAT TCCTTTGGTT GCAGTTAACA AGGCTGGGGA 4225 

TTTTTCCAG AG TCC CAC ACC CTG CAG GTC ATC CTG GGC TGT GAA ATG 4272 
Glu Ser His Thr Leu Gin Val lie Leu Gly Cys Glu Met 
115 120 125 

CAA GAA GAC AAC AGT ACC GAG GGC TAG TGG AAG TAG GGG TAT GAT GGG 432 0 

Gin Glu Asp Asn Ser Thr Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly 
130 135 140 

CAG GAC CAC CTT GAA TTC TGC CCT GAC ACA CTG GAT TGG AGA GCA GCA 4368 
Gin Asp His Leu Glu Phe Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala 
145 150 155 

GAA CCC AGG GCC TGG CCC ACC AAG CTG GAG TGG GAA AGG CAC AAG ATT 4416 
Glu Pro Arg Ala Trp Pro Thr Lys Leu Glu Trp Glu Arg His Lys lie 
160 165 170 

CGG GCC AGG CAG AAC AGG GCC TAC CTG GAG AGG GAC TGC CCT GCA CAG 4464 
Arg Ala Arg Gin Asn Arg Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin 



175 


180 


185 




190 




CTG CAG CAG TTG CTG GAG CTG GGG AGA GGT GTT 
Leu Gin Gin Leu Leu Glu Leu Gly Arg Gly Val 
195 200 


TTG GAC CAA CAA G 
Leu Asp Gin Gin 
205 


4510 


GTATGGTGGA 


AACACACTTC 


TGCCCCTATA 


CTCTAGTGGC 


AGAGTGGAGG 


AGGTTGCAGG 


4570 


GCACGGAATC 


CCTGGTTGGA 


GTTTCAGAGG 


TGGCTGAGGC 


TGTGTGCCTC 


TCCAAATTCT 


4630 


GGGAAGGGAC 


TTTCTCAATC 


CTAGAGTCTC 


TACCTTATAA 


TTGAGATGTA 


TGAGACAGCC 


4690 


ACAAGTCATG 


GGTTTAATTT 


CTTTTCTCCA 


TGCATATGGC 


TCAAAGGGAA 


GTGTCTATGG 


4750 


CCCTTGCTTT 


TTATTTAACC 


AATAATCTTT 


TGTATATTTA 


TACCTGTTAA 


AAATTCAGAA 


4810 


ATGTCAAGGC 


CGGGCACGGT 


GGCTCACCCC 


TGTAATCCCA 


GCACTTTGGG 


AGGCCGAGGC 


4870 
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GGGTGGTCAC 


AAGGTCAGGA 


GTTTGAGACC 


AGCCTGACCA 


ACATGGTGAA 


ACCCGTCTCT 


4930 


AAAAAAATAC 


AAAAATTAGC 


TGGTCACAGT 


CATGCGCACC 


TGTAGTCCCA 


GCTAATTGGA 


4990 


AGGCTGAGGC 


AGGAGCATCG 


CTTGAACCTG 


GGAAGCGGAA 


GTTGCACTGA 


GCCAAGATCG 


5050 


CGCCACTGCA 


CTCCAGCCTA 


GGCAGCAGAG 


TGAGACTCCA 


TCTTAAAAAA 


AAAAAAAAAA 


5110 


AAAAAAAGAG 


AATTCAGAGA 


TCTCAGCTAT 


CATATGAATA 


CCAGGACAAA 


ATATCAAGTG 


5170 


AGGCCACTTA 


TCAGAGTAGA 


AGAATCCTTT 


AGGTTAAAAG 


TTTCTTTCAT 


AGAACATAGC 


5230 


AATAATCACT 


GAAGCTACCT 


ATCTTACAAG 


TCCGCTTCTT 


ATAACAATGC 


CTGCTAGGTT 


5290 


GACCCAGGTG 


AAACTGACCA 


TCTGTATTCA 


ATCATTTTCA 


ATGCACATAA 


AGGGCAATTT 


5350 


TATCTATCAG 


AACAAAGAAC 


ATGGGTAACA 


GATATGTATA 


TTTACATGTG 


AGGAGAACAA 


5410 


GCTGATCTGA 


CTGCTCTCCA 


AGTGACACTG 


TGTTAGAGTC 


CAATCTTAGG 


ACACAAAATG 


5470 


GTGTCTCTCC 


TGTAGCTTGT 


TTTTTTCTGA 


AAAGGGTATT 


TCCTTCCTCC 


AACCTATAGA 


5530 


AGGAAGTGAA 


AGTTCCAGTC 


TTCCTGGCAA 


GGGTAAACAG 


ATCCCCTCTC 


CTCATCCTTC 


5590 


CTCTTTCCTG 


TCAAG TG CCT CCT TTG 
Val Pro Pro Leu 


GTG AAG GTG ACA CAT CAT GTG ACC 
Val Lys Val Thr His His Val Thr 


5640 



210 215 

TCT TCA GTG ACC ACT CTA CGG TGT CGG GCC TTG AAC TAC TAC CCC CAG 5688 
Ser Ser Val Thr Thr Leu Arg Cys Arg Ala Leu Asn Tyr Tyr Pro Gin 
220 225 230 

AAC ATC ACC ATG AAG TGG CTG AAG GAT AAG CAG CCA ATG GAT GCC AAG 5736 
Asn lie Thr Met Lys Trp Leu Lys Asp Lys Gin Pro Met Asp Ala Lys 
235 240 245 

GAG TTC GAA CCT AAA GAC GTA TTG CCC AAT GGG GAT GGG ACC TAC CAG 5784 
Glu Phe Glu Pro Lys Asp Val Leu Pro Asn Gly Asp Gly Thr Tyr Gin 
250 255 260 265 

GGC TGG ATA ACC TTG GCT GTA CCC CCT GGG GAA GAG CAG AGA TAT ACG 5832 
Gly Trp lie Thr Leu Ala Val Pro Pro Gly Glu Glu Gin Arg Tyr Thr 
270 275 280 

TGC CAG GTG GAG CAC CCA GGC CTG GAT CAG CCC CTC ATT GTG ATC TGG G 5881 
Cys Gin Val Glu His Pro Gly Leu Asp Gin Pro Leu lie Val He Trp 
285 290 295 

GTATGTGACT GATGAGAGCC AGGAGCTGAG AAAATCTATT GGGGGTTGAG AGGAGTGCCT 5941 

GAGGAGGTAA TTATGGCAGT GAGATGAGGA TCTGCTCTTT GTTAGGGGGT GGGCTGAGGG 6001 

TGGCAATCAA AGGCTTTAAC TTGCTTTTTC TGTTTTAG AG CCC TCA CCG TCT 6053 

Glu Pro Ser Pro Ser 
300 

GGC ACC CTA GTC ATT GGA GTC ATC AGT GGA ATT GCT GTT TTT GTC GTC 6101 
Gly Thr Leu Val He Gly Val He Ser Gly He Ala Val Phe Val Val 
305 310 315 

ATC TTG TTC ATT GGA ATT TTG TTC ATA ATA TTA AGG AAG AGG CAG GGT 6149 
He Leu Phe He Gly He Leu Phe He He Leu Arg Lys Arg Gin Gly 
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320 325 330 

TCA A GTGAGTAGGA ACAAGGGGGA AGTCTCTTAG TACCTCTGCC CCAGGGCACA 6203 

Ser 

335 



GTGGGAAGAG 


GGGCAGAGGG 


GATCTGGCAT 


CCATGGGAAG 


CATTTTTCTC 


ATTTATATTC 


6263 


TTTGGGGACA 


CCAGCAGCTC 


CCTGGGAGAC 


AGAAAATAAT 


GGTTCTCCCC 


AGAATGAAAG 


6323 


TCTCTAATTC 


AACAAACATC 


TTCAGAGCAC 


CTACTATTTT 


GCAAGAGCTG 


TTTAAGGTAG 


6383 


TACAGGGGCT 


TTGAGGTTGA 


GAAGTCACTG 


TGGCTATTCT 


CAGAACCCAA 


ATCTGGTAGG 


6443 


GAATGAAATT 


GATAGCAAGT 


AAATGTAGTT 


AAAGAAGACC 


CCATGAGGTC 


CTAAAGCAGG 


6503 


CAGGAAGCAA 


ATGCTTAGGG 


TGTCAAAGGA 


AAGAATGATC 


ACATTCAGCT 


GGGGATCAAG 


6563 


ATAGCCTTCT 


GGATCTTGAA 


GGAGAAGCTG 


GATTCCATTA 


GGTGAGGTTG 


AAGATGATGG 


6623 


GAGGTCTACA 


CAGACGGAGC 


AACCATGCCA 


AGTAGGAGAG 


TATAAGGCAT 


ACTGGGAGAT 


6683 


TAGAAATAAT 


TACTGTACCT 


TAACCCTGAG 


TTTGCGTAGC 


TATCACTCAC 


CAATTATGCA 


6743 


TTTCTACCCC 


CTGAACATCT 


GTGGTGTAGG 


GAAAAGAGAA 


TCAGAAAGAA 


GCCAGCTCAT 


6803 


ACAGAGTCCA 


AGGGTCTTTT 


GGGATATTGG 


GTTATGATCA 


CTGGGGTGTC 


ATTGAAGGAT 


6863 


CCTAAGAAAG 


GAGGACCACG 


ATCTCCCTTA 


TATGGTGAAT 


GTGTTGTTAA 


GAAGTTAGAT 


6923 


GAGAGGTGAG 


GAGACCAGTT 


AGAAAGCCAA 


TAAGCATTTC 


CAGATGAGAG 


ATAATGGTTC 


6983 


TTGAAATCCA 


ATAGTGCCCA 


GGTCTAAATT 


GAGATGGGTG 


AATGAGGAAA 


ATAAGGAAGA 


7043 


GAGT^GAGGC 


AAGATGGTGC 


CTAGGTTTGT 


GATGCCTCTT 


TCCTGGGTCT 


CTTGTCTCCA 


7103 


GAG GA GGA GCC ATG GGG CAC TAG GTC TTA GCT 
Arg Gly Ala Met Gly His Tyr Val Leu Ala 
340 345 


GAA CGT GAG 
Glu Arg Glu 


7144 


TGACACGCAG 


CCTGCAGACT 


CACTGTGGGA 


AGGAGACAAA 


ACTAGAGACT 


CAAA6AGGGA 


7204 


GTGCATTTAT 


GAGCTCTTCA 


TGTTTCAGGA 


GAGAGTTGAA 


CCTAAACATA 


GAAATTGCCT 


7264 


GACGAACTCC 


TTGATTTTAG 


CCTTCTCTGT 


TCATTTCCTC 


AAAAAGATTT 


CCCCATTTAG 


7324 


GTTTCTGAGT 


TCCTGCATGC 


CGGTGATCCC 


TAGCTGTGAC 


CTCTCCCCTG 


GAACTGTCTC 


7384 


TCATGAACCT 


CAAGCTGCAT 


CTAGAGGCTT 


CCTTCATTTC 


CTCCGTCACC 


TCAGAGACAT 


7444 


ACACCTATGT 


CATTTCATTT 


CCTATTTTTG 


GAAGAGGACT 


CCTTAAATTT 


GGGGGACTTA 


7504 


CATGATTCAT 


TTTAACATCT 


GAGAAAAGCT 


TTGAACCCTG 


GGACGTGGCT 


AGTCATAACC 


7564 


TTACCAGATT 


TTTACACATG 


TATCTATGCA 


TTTTCTGGAC 


CCGTTCAACT 


TTTCCTTTGA 


7624 


ATCCTCTCTC 


TGTGTTACCC 


AGTAACTCAT 


CTGTCACCAA 


GCCTTGGGGA 


TTCTTCCATC 


7684 


TGATTGTGAT 


GTGAGTTGCA 


CAGCTATGAA 


GGCTGTACAC 


TGCACGAATG 


GAAGAGGCAC 


7744 


CTGTCCCAGA 


AAAAGCATCA 


TGGCTATCTG 


TGGGTAGTAT 


GATGGGTGTT 


TTTAGCAGGT 


7804 
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TATPTTG A A A 


f3GGG' I *n ^ci'va a 


A G A GnTGTTT 
/iV3/i^j^ X Vj X X X 


X X X V_ X/i/\X Xo 


GP ATG A AGnT 


f 00% 


GTrATAPAnA 


TTTGP A A A GT 


TTA ti'VCXCWCXC^ 
X XArlXVjuXv?^ 


PTTP A TTTGG 
X X V.»ri XXX uVj 


GATGPTAPTP 


TAGTATTPPA 
XxWSXnX XV-Vh^rl 


7 Q9 J. 




ATPAPAATAA 


' PT' 'PT A C^C^T* 
1 X 1 X ^ XxiV..'^ X 


nGTPTPTPPT 
\3\3 X U X L« X X 


X VJ X X V- X \Ji\ X n 


ATGA A A ATTA 
/i X Oxiriri/i X Xri 


n QQA 
f jO*k 


TGATAAGGAT 


GATAAAAGGA 


PTT A PnrTPGT* 


GTPPG A PTPT 


TPTG A GP A PP 


TAPTT AP ATG 
X X X X U 


0 ft 


C! ATT APTGP A 


TGPAPTTPTT 

X O^i-^nV* X X ^ X X 


AP A ATA ATTr* 


TATGAGATAG 


GT APT ATT AT 
Vj X XnX X n X 


PPPP ATTTPT' 

V_V_V^V_AX X X \v X 




TTTTTA A ATG 

X X X X X/vu-LXVj 


AAGAAAGTGA 


AGTAGGPPGG 


OWa^oVj X VjvjL. 


TPAPGPPTGT 
X V^/i.L.O^L' X V7 X 


A ATPPP AGPA 




CTTTGGGAGG 


PPAAAGPGGG 


TGGATPAPGA 


GGTPAGGAGA 


TPGAGAPPAT 


PPTGGPTAAP 


0 ^ ^ t 


ATGGTG A A A P 


PPPATPTPTA 


ATAAAAATAP 


AAAAAATTAG 


PTGGGPGTGG 


TGGPAGAPGP 


0 ^ 0 ft 


PTGTAGTrrr 


AGPTAPTPGG 


A AGGPTG AGG 


P AGGAGA ATG 


GPATGAAPPP 
vjv_^ X v:iivvv»v_v. 


AGGAGGPAGA 


0 J fcfr 


OV.- X 1 oCAo 1 \J 


Ao^L.oAol 1 X 


vjCoLpL^L. X oV. 


AV_ X L.UAV7CC X 


A r*(^T/^ a 0 A ^ti a 

A^VJ X vjAkJAoA 


Ky X oAurAL. X 


P^ 0^ 


ATPTP A A a 2i A 


aaTA AaaaTa 


A A A ATA A AAA 


AATGA A A Aa A 


AA A AGA A AGT 


GA AGTATAGA 


C3 ^ D ft 


T TV Tr^nr^ TV T\ 

o 1 A 1 L. 1 CAl A 


^1 ^n^n^p^' ^^^^ TV 
V3 1 1 X X UAo X 


^ A X AoAAAvJA 


Uvi X r X C-AAAU 


X vJALr X L.AAX L. 


X(jAL.L.oX X XO 


0 C y1 




a p T\ /^r* a r**v tv 


LAX X UAij X Avj 


X X XAuAXvjV-V, 


TAr* A A TA A A f 

XAvjAAX AAAX 


ar'ar'a ar*r*a a 
AVjACjAAtjVjAA 


0 C Q j4 




XUXCX XoX 


A T' 1 'P p 
UXvJAX XuXVjX 


X X L X X X o AV3 


X uAVjU X X oAA 


X VJALiA X Vj AAo 




Gr!(^ A a a nr* a 


r'a a a ar'a arr* 

va AAAAL^hAI- c 


AAV- X (jjAX V^C X 


vJAVav_ X ij X vJA X 


oX X X X X X A 


AAAvj X L>V.^U> X Vj 


0 / Uffc 


a APGa unf^Tn 


X vavj AA X X V3 


AC X V,V„L X X vjV- 


XV^v. XV^XVjX X\3 


/"tip/ irn/^rpi 1 11 
L.XV«XV-XXX V3\J 


\-M.X XV-AX X XV_ 


one A 

0 / Oft 




AuvjCAAvjuAL. 


XuXAAX Xu^X 


vjui^VaAwioL. X 


AV3 X OVjCLU X KS 


L. X (jijfjL. X X Wi 


Q Q 9 A 




v^U X L'UU X Avii? 


L.\JAU X IjUL. X V. 


X vio AU X (JAVaA 


AU X U X LjVj X ijVjf 


XAX X XUv-U Xk. 


QQ Q A 


A ATGaar*TGr« 


a a p PTPT 

Ao X AAoL. X X 


f^^T^f^ TV M"I"I*'I'P A 
U X vJAX XXX oA 


r'ATnr'TaTaa 

oAXurvjXAXAA 


Tnr'aaGr'r'ap 


pa aGTGGPTT 
wiAoXooCX X 


Q Q A A 




vJAvi^ X UL. XXL. 


vJAX tjV?At7L.LA. 


L.X0000X XUL. 


VioXuUAUAX X 


aaaaaaaaaa 
AAAAAAAAAA 






APAT^rnAnGA 


A TTr" p*r a n a T 

AX XVjUXAoAX 


TT'Tnnr'a a at* 

X L. X VjVjtjAAAX 


p anTTPA pr* A 


TGTTP A A A AP 
X u X X LAHAAVj 


^UOft 


a /*i'M/ '••1 HI II 1 II 1 1*1 lip 
xClxllxl 


XXXXXXXX v^A 


r* a r*Tr"T»a tit* 
OAL.XUXAX Xo 


L-ULAvitjv, X Vjtj 


AoXoLAAXoo 


p a TP a TPTPP 
vJA X IjA X L. X L.^ 




GPTPAPTGTA 


A PPTPTGPPT 


PPPAG^2TTr»A 
V^^^WVoV^X XvJA 


Avj^VjAX X V, X V 


X X L X VJAoV^ 


PTPPPa APT A 


Ql PA 
^X.04 


nr*'T*r*r*r' a Tfa 


uA^tVjCVj X VjvJA 


CvJACCAX vjCL. 


LtivjCXAAX X X 


X XbXAX X X X X 


AP«T"APAP APA 

AC? X AvsAoAL. A 


Q O A A 


nncTTTn a rr* 

vjorVa i i i V^AUU 


AXvjX XviVjV^CA 


taVsU XVjvaXuXk. 


UAAv, X C X uv^ X 


CjAC^L. X UVj X UA 


X UUkjv»L. X VaUC 




TCGGCCTCCC 


AAAGTGCTGA 


GATTACAGGT 


GTGAGCCACC 


CTGCCCAGCC 


GTCAAAAGAG 


9364 


TCTTAATATA 


TATATCCAGA 


TGGCATGTGT 


TTACTTTATG 


TTACTACATG 


CACTTGGCTG 


9424 


CATAAATGTG 


GTACAAGCAT 


TCTGTCTTGA 


AGGGCAGGTG 


CTTCAGGATA 


CCATATACAG 


9484 


CTCAGAAGTT 


TCTTCTTTAG 


GCATTAAATT 


TTAGCAAAGA 


TATCTCATCT 


CTTCTTTTAA 


9544 


ACCATTTTCT 


TTTTTTGTGG 


TTAGAAAAGT 


TATGTAGAAA 


AAAGTAAATG 


TGATTTACGC 


9604 


TCATTGTAGA 


AAAGCTATAA 


AATGAATACA 


ATTAAAGCTG 


TTATTTAATT 


AGCCAGTGAA 


9664 
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AAACTATTAA 


CAACTTGTCT 


ATTACCTGTT 


AGTATTATTG 


TTGCATTAAA 


AATGCATATA 


9724 


CTTTAATAAA 


TGTATATTGT 


ATTGTATACT 


GCATGATTTT 


ATTGAAGTTC 


TTGTTCATCT 


9784 


TGTGTATATA 


CTTAATCGCT 


TTGTCATTTT 


GGAGACATTT 


ATTTTGCTTC 


TAATTTCTTT 


9844 


ACATTTTGTC 


TTACGGAATA 


TTTTCATTCA 


ACTGTGGTAG 


CCGAATTAAT 


CGTGTTTCTT 


9904 


CACTCTAGGG 


ACATTGTCGT 


CTAAGTTGTA 


AGACATTGGT 


TATTTTACCA 


GCAAACCATT 


9964 


CTGAAAGCAT 


ATGACAAATT 


ATTTCTCTCT 


TAATATCTTA 


CTATACTGAA 


AGCAGACTGC 


10024 


TATAAGGCTT 


CACTTACTCT 


TCTACCTCAT 


AAGGAATATG 


TTACAATTAA 


TTTATTAGGT 


10084 


AAGCATTTGT 


TTTATATTGG 


TTTTATTTCA 


CCTGGGCTGA 


GATTTCAAGA 


AACACCCCAG 


10144 


TCTTCACAGT 


AACACATTTC 


ACTAACACAT 


TTACTAAACA 


TCAGCAACTG 


TGGCCTGTTA 


10204 


ATTTTTTTAA 


TAGAAATTTT 


AAGTCCTCAT 


TTTCTTTCGG 


TGTTTTTTAA 


GCTTAATTTT 


10264 


TCTGGCTTTA 


TTCATAAATT 


CTTAAGGTCA 


ACTACATTTG 


AAAAATCAAA 


GACCTGCATT 


10324 


TTAAATTCTT 


ATTCACCTCT 


GGCAAAACCA 


TTCACAAACC 


ATGGTAGTAA 


AGAGAAGGGT 


10384 


GACACCTGGT 


GGCCATAGGT 


AAATGTACCA 


CGGTGGTCCG 


GTGACCAGAG 


ATGCAGCGCT 


10444 


GAGGGTTTTC 


CTGAAGGTAA 


AGGAATAAAG 


AATGGGTGGA 


GGGGCGTGCA 


CTGGAAATCA 


10504 


CTTGTAGAGA 


AAAGCCCCTG 


AAAATTTGAG 


AAAACAAACA 


AGAAACTACT 


TACCAGCTAT 


10564 


TTGAATTGCT 


GGAATCACAG 


GCCATTGCTG 


AGCTGCCTGA 


ACTGGGAACA 


CAACAGAAGG 


10624 


AAAACAAACC 


ACTCTGATAA 


TCATTGAGTC 


AAGTACAGCA 


GGTGATTGAG 


GACTGCTGAG 


10684 


AGGTACAGGC 


CAAAATTCTT 


ATGTTGTATT 


ATAATAATGT 


CATCTTATAA 


TACTGTCAGT 


10744 


ATTTTATAAA 


ACATTCTTCA 


CAAACTCACA 


CACATTTAAA 


AACAAAACAC 


TGTCTCTAAA 


10804 


ATCCCCAAAT 


TTTTCATAAA 


C 








10825 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 348 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Gly Pro Arg Ala Arg Pro Ala Leu Leu Leu Leu Met Leu Leu Gin 
15 10 15 

Thr Ala Val Leu Gin Gly Arg Leu Leu Arg Ser His Ser Leu His Tyr 
20 25 30 

Leu Phe Met Gly Ala Ser Glu Gin Asp Leu Gly Leu Ser Leu Phe Glu 
35 40 45 

Ala Leu Gly Tyr Val Asp Asp Gin Leu Phe Val Phe Tyr Asp His Glu 
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50 55 60 

Ser Arg Arg Val Glu Pro Arg Thr Pro Trp Val Ser Ser Arg lie Ser 
65 70 75 80 

Ser Gin Met Trp Leu Gin Leu Ser Gin Ser Leu Lys Gly Trp Asp His 
85 90 95 

Met Phe Thr Val Asp Phe Trp Thr lie Met Glu Asn His Asn His Ser 
100 105 110 

Lys Glu Ser His Thr Leu Gin Val lie Leu Gly Cys Glu Met Gin Glu 
115 120 125 

Asp Asn Ser Thr Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly Gin Asp 
130 135 140 

His Leu Glu Phe Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala Glu Pro 
145 150 155 160 

Arg Ala Trp Pro Thr Lys Leu Glu Trp Glu Arg His Lys lie Arg Ala 
165 170 175 

Arg Gin Asn Arg Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin Leu Gin 
180 185 190 

Gin Leu Leu Glu Leu Gly Arg Gly Val Leu Asp Gin Gin Val Pro Pro 
195 200 205 

Leu Val Lys Val Thr His His Val Thr Ser Ser Val Thr Thr Leu Arg 
210 215 220 

Cys Arg Ala Leu Asn Tyr Tyr Pro Gin Asn lie Thr Met Lys Trp Leu 
225 230 235 240 

Lys Asp Lys Gin Pro Met Asp Ala Lys Glu Phe Glu Pro Lys Asp Val 
245 250 255 

Leu Pro Asn Gly Asp Gly Thr Tyr Gin Gly Trp lie Thr Leu Ala Val 
260 • 265 270 

Pro Pro Gly Glu Glu Gin Arg Tyr Thr Cys Gin Val Glu His Pro Gly 
275 280 285 

Leu Asp Gin Pro Leu lie Val lie Trp Glu Pro Ser Pro Ser Gly Thr 
290 295 300 

Leu Val lie Gly Val He Ser Gly He Ala Val Phe Val Val He Leu 
305 310 315 320 

Phe He Gly He Leu Phe He He Leu Arg Lys Arg Gin Gly Ser Arg 
325 330 335 

Gly Ala Met Gly His Tyr Val Leu Ala Glu Arg Glu 
340 345 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10825 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: join (361 .. 436 , 3762.. 4025, 4235.. 4510, 5606.. 5881, 

6040.. 6153, 7107.. 7147) 
(D) OTHER INFORMATION: /product= "Hereditary Hemochromatosis 

(HH) protein containing the 24dl 
mutation" 

/notes "Hereditary Hemochromatosis (HH) 
gene 24dl allele" 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 140.. 7319 

(D) OTHER INFORMATION: /note= "start and stop positions for 

24dl allele cDNA (SEQ ID NOrlO)" 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 3852,. 3891 

(D) OTHER INFORMATION: /note= "start and stop positions for 

genomic sequence surrounding variant 
for 24d2{C) allele (SEQ ID NO:41)" 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 5507.. 6023 

(D) OTHER INFORMATION: /note= "Start and stop positions for 

genomic sequence surrounding variant 
for 24dl(A) allele (SEQ ID N0:21)" 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (5834 , "a") 

(D) OTHER INFORMATION: /phenotype= "Hereditary Hemochromatosis 

(HH) " 

/label= 24dl 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



TCTAAGGTTG AGATAAAATT TTTAAATGTA TGATTGAATT 


TTGAAAATCA 


TAAATATTTA 


60 


AATATCTAAA GTTCAGATCA GAACATTGCG AAGCTACTTT 


CCCCAATCAA 


CAACACCCCT 


120 


TCAGGATTTA AAAACCAAGG GGGACACTGG ATCACCTAGT 


GTTTCACAAG 


CAGGTACCTT 


180 


CTGCTGTAGG AGAGAGAGAA CTAAAGTTCT GAAAGACCTG 


TTGCTTTTCA 


CCAGGAAGTT 


240 


TTACTGGGCA TCTCCTGAGC CTAGGCAATA GCTGTAGGGT 


GACTTCTGGA 


GCCATCCCCG 


300 


TTTCCCCGCC CCCCAAAAGA AGCGGAGATT TAACGGGGAC 


GTGCGGCCAG 


AGCTGGGGAA 


360 


ATG GGC CCG CGA GCC AGG CCG GCG CTT CTC CTC 
Met Gly Pro Arg Ala Arg Pro Ala Leu Leu Leu 


CTG ATG CTT TTG CAG 
Leu Met Leu Leu Gin 


408 



15 10 15 
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ACC GCG GTC CTG CAG GGG CGC TTG CTG C GTGAGTCCGA GGGCTGCGGG 456 
Thr Ala Val Leu Gin Gly Arg Leu Leu 
20 25 

CGAACTAGGG GCGCGGCGGG GGTGGAAAAA TCGAAACTAG CTTTTTCTTT GCGCTTGGGA 516 

GTTTGCTAAC TTTGGAGGAC CTGCTCAACC CTATCCGCAA GCCCCTCTCC CTACTTTCTG 576 

CGTCCAGACC CCGTGAGGGA GTGCCTACCA CTGAACTGCA GATAGGGGTC CCTCGCCCCA 636 

GGACCTGCCC CCTCCCCCGG CTGTCCCGGC TCTGCGGAGT GACTTTTGGA ACCGCCCACT 696 

CCCTTCCCCC AACTAGAATG CTTTTAAATA AATCTCGTAG TTCCTCACTT GAGCTGAGCT 756 

AAGCCTGGGG CTCCTTGAAC CTGGAACTCG GGTTTATTTC CAATGTCAGC TGTGCAGTTT 816 

TTTCCCCAGT CATCTCCAAA CAGGAAGTTC TTCCCTGAGT GCTTGCCGAG AAGGCTGAGC 876 

AAACCCACAG CAGGATCCGC ACGGGGTTTC CACCTCAGAA CGAATGCGTT GGGCGGTGGG 936 

GGCGCGAAAG AGTGGCGTTG GGGATCTGAA TTCTTCACCA TTCCACCCAC TTTTGGTGAG 996 

ACCTGGGGTG GAGGTCTCTA GGGTGGGAGG CTCCTGAGAG AGGCCTACCT CGGGCCTTTC 1056 

CCCACTCTTG GCAATTGTTC TTTTGCCTGG AAAATTAAGT ATATGTTAGT TTTGAACGTT 1116 

TGAACTGAAC AATTCTCTTT TCGGCTAGGC TTTATTGATT TGCAATGTGC TGTGTAATTA 1176 

AGAGGCCTCT CTACAAAGTA CTGATAATGA ACATGTAAGC AATGCACTCA CTTCTAAGTT 1236 

ACATTCATAT CTGATCTTAT TTGATTTTCA CTAGGCATAG GGAGGTAGGA GCTAATAATA 1296 

CGTTTATTTT ACTAGAAGTT AACTGGAATT CAGATTATAT AACTCTTTTC AGGTTACAAA 1356 

GAACATAAAT AATCTGGTTT TCTGATGTTA TTTCAAGTAC TACAGCTGCT TCTAATCTTA 1416 

GTTGACAGTG ATTTTGCCCT GTAGTGTAGC ACAGTGTTCT GTGGGTCACA CGCCGGCCTC 1476 

AGCACAGCAC TTTGAGTTTT GGTACTACGT GTATCCACAT TTTACACATG ACAAGAATGA 1536 

GGCATGGCAC GGCCTGCTTC CTGGCAAATT TATTCAATGG TACACTGGGC TTTGGTGGCA 1596 

GAGCTCATGT CTCCACTTCA TAGCTATGAT TCTTAAACAT CACACTGCAT TAGAGGTTGA 1656 

ATAATAAAAT TTCATGTTGA GCAGAAATAT TCATTGTTTA CAAGTGTAAA TGAGTCCCAG 1716 

CCATGTGTTG CACTGTTCAA GCCCCAAGGG AGAGAGCAGG GAAACAAGTC TTTACCCTTT 1776 

GATATTTTGC ATTCTAGTGG GAGAGATGAC AATAAGCAAA TGAGCAGAAA GATATACAAC 1836 

ATCAGGAAAT CATGGGTGTT GTGAGAAGCA GAGAAGTCAG GGCAAGTCAC TCTGGGGCTG 1896 

ACACTTGAGC AGAGACATGA AGGAAATAAG AATGATATTG ACTGGGAGCA GTATTTCCCA 1956 

GGCAAACTGA GTGGGCCTGG CAAGTTGGAT TAAAAAGCGG GTTTTCTCAG CACTACTCAT 2016 

GTGTGTGTGT GTGGGGGGGG GGGGCGGCGT GGGGGTGGGA AGGGGGACTA CCATCTGCAT 2076 

GTAGGATGTC TAGCAGTATC CTGTCCTCCC TACTCACTAG GTGCTAGGAG CACTCCCCCA 2136 

GTCTTGACAA CCAAAAATGT CTCTAAACTT TGCCACATGT CACCTAGTAG ACAAACTCCT 2196 
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GGTTAAGAAG 


CTCGGGTTGA 


AAAAAATAAA 


CAAGTAGTGC 


TGGGGAGTAG 


AGGCCAAGAA 


2256 


GTAGGTAATG 


GGCTCAGAAG 


AGGAGCCACA 


AACAAGGTTG 


TGCAGGCGCC 


TGTAGGCTGT 


2316 


GGTGTGAATT 


CTAGCCAAGG 


AGTAACAGTG 


ATCTGTCACA 


GGCTTTTAAA 


AGATTGCTCT 


2376 


GGCTGCTATG 


TGGAAAGCAG 


AATGAAGGGA 


GCAACAGTAA 


AAGCAGGGAG 


CCCAGCCAGG 


2436 


AAGCTGTTAC 


ACAGTCCAGG 


CAAGAGGTAG 


TGGAGTGGGC 


TGGGTGGGAA 


CAGAAAAGGG 


2496 


AGTGACAAAC 


CATTGTCTCC 


TGAATATATT 


CTGAAGGAAG 


TTGCTGAAGG 


ATTCTATGTT 


2556 


GTGTGAGAGA 


AAGAGAAGAA 


TTGGCTGGGT 


GTAGTAGCTC 


ATGCCAAGGA 


GGAGGCCAAG 


2616 


GAGAGCAGAT 


TCCTGAGCTC 


AGGAGTTCAA 


GACCAGCCTG 


GGCAACACAG 


CAAAACCCCT 


2676 


TCTCTACAAA 


AAATACAAAA 


ATTAGCTGGG 


TGTGGTGGCA 


TGCACCTGTG 


ATCCTAGCTA 


2736 


CTCGGGAGGC 


TGAGGTGGAG 


GGTATTGCTT 


GAGCCCAGGA 


AGTTGAGGCT 


GCAGTGAGCC 


2796 


ATGACTGTGC 


CACTGTACTT 


CAGCCTAGGT 


GACAGAGCAA 


GACCCTGTCT 


CCCCTGACCC 


2856 


CCTGAAAAAG 


AGAAGAGTTA 


AAGTTGACTT 


TGTTCTTTAT 


TTTAATTTTA 


TTGGCCTGAG 


2916 


CAGTGGGGTA 


ATTGGCAATG 


CCATTTCTGA 


GATGGTGAAG 


GCAGAGGAAA 


GAGCAGTTTG 


2976 


GGGTAAATCA 


AGGATCTGCA 


TTTGGGACAT 


GTTAAGTTTG 


AGATTCCAGT 


CAGGCTTCCA 


3036 


AGTGGTGAGG 


CCACATAGGC 


AGTTCAGTGT 


AAGAATTCAG 


GACCAAGGCT 


GGGCACGGTG 


3096 


GCTCACTTCT 


GTAATCCCAG 


CACTTTGGTG 


GCTGAGGCAG 


GTAGATCATT 


TGAGGTCAGG 


3156 


AGTTTGAGAC 


AAGCTTGGCC 


AACATGGTGA 


AACCCCATGT 


CTACTAAAAA 


TACAAAAATT 


3216 


AGCCTGGTGT 


GGTGGCGCAC 


GCCTATAGTC 


CCAGGTTTTC 


AGGAGGCTTA 


GGTAGGAGAA 


3276 


TCCCTTGAAC 


CCAGGAGGTG 


CAGGTTGCAG 


TGAGCTGAGA 


TTGTGCCACT 


GCACTCCAGC 


3336 


CTGGGTGATA 


GAGTGAGACT 


CTGTCTCAAA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAAACTGA 


3396 


AGGAATTATT 


CCTCAGGATT 


TGGGTCTAAT 


TTGCCCTGAG 


CACCAACTCC 


TGAGTTCAAC 


3456 


TACCATGGCT 


AGACACACCT 


TAACATTTTC 


TAGAATCCAC 


CAGCTTTAGT 


GGAGTCTGTC 


3516 


TAATCATGAG 


TATTGGAATA 


GGATCTGGGG 


GCAGTGAGGG 


GGTGGCAGCC 


ACGTGTGGCA 


3576 


GAGAAAAGCA 


CACAAGGAAA 


GAGCACCCAG 


GACTGTCATA 


TGGAAGAAAG 


ACAGGACTGC 


3636 


AACTCACCCT 


TCACAAAATG 


AGGACCAGAC 


ACAGCTGATG 


GTATGAGTTG 


ATGCAGGTGT 


3696 


GTGGAGCCTC 


AACATCCTGC 


TCCCCTCCTA 


CTACACATGG 


TTAAGGCCTG 


TTGCTCTGTC 


3756 


TCCAG GT TCA CAC TCT 


CTG CAC TAC CTC TTC ATG GGT GCC 


rCA GAG 


3802 



Arg Ser His Ser Leu His Tyr Leu Phe Met Gly Ala Ser Glu 
30 35 



CAG GAC CTT GGT CTT TCC TTG TTT GAA GCT TTG GGC TAC GTG GAT GAC 3850 
Gin Asp Leu Gly Leu Ser Leu Phe Glu Ala Leu Gly Tyr Val Asp Asp 
40 45 50 55 

CAG CTG TTC GTG TTC TAT GAT CAT GAG AGT CGC CGT GTG GAG CCC CGA 3898 
Gin Leu Phe Val Phe Tyr Asp His Glu Ser Arg Arg Val Glu Pro Arg 
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60 65 70 

ACT CCA TGG GTT TCC AGT AGA ATT TCA AGC CAG ATG TGG CTG CAG CTG 3946 
Thr Pro Trp Val Ser Ser Arg lie Ser Ser Gin Met Trp Leu Gin Leu 
75 80 85 

AGT CAG AGT CTG AAA GGG TGG GAT CAC ATG TTC ACT GTT GAC TTC TGG 3994 
Ser Gin Ser Leu Lys Gly Trp Asp His Met Phe Thr Val Asp Phe Trp 
90 95 100 

ACT ATT ATG GAA AAT CAC AAC CAC AGC AAG G GTATGTGGAG AGGGGGCCTC 4045 
Thr lie Met Glu Asn His Asn His Ser Lys 
105 110 

ACCTTCCTGA GGTTGTCAGA GCTTTTCATC TTTTCATGCA TCTTGAAGGA AACAGCTGGA 4105 

AGTCTGAGGT CTTGTGGGAG CAGGGAAGAG GGAAGGAATT TGCTTCCTGA GATCATTTGG 4165 

TCCTTGGGGA TGGTGGAAAT AGGGACCTAT TCCTTTGGTT GCAGTTAACA AGGCTGGGGA 4225 

TTTTTCCAG AG TCC CAC ACC CTG CAG GTC ATC CTG GGC TGT GAA ATG 4272 
Glu Ser His Thr Leu Gin Val He Leu Gly Cys Glu Met 
115 120 125 

CAA GAA GAC AAC AGT ACC GAG GGC TAC TGG AAG TAC GGG TAT GAT GGG 4320 
Gin Glu Asp Asn Ser Thr Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly 
130 135 140 

CAG GAC CAC CTT GAA TTC TGC CCT GAC ACA CTG GAT TGG AGA GCA GCA 4368 
Gin Asp His Leu Glu Phe Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala 
145 150 155 

GAA CCC AGG GCC TGG CCC ACC AAG CTG GAG TGG GAA AGG CAC AAG ATT 4416 
Glu Pro Arg Ala Trp Pro Thr Lys Leu Glu Trp Glu Arg His Lys He 
160 165 170 

CGG GCC AGG CAG AAC AGG GCC TAC CTG GAG AGG GAC TGC CCT GCA CAG 4464 
Arg Ala Arg Gin Asn Arg Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin 
175 180 185 190 

CTG CAG CAG TTG CTG GAG CTG GGG AGA GGT GTT TTG GAC CAA CAA G 4510 
Leu Gin Gin Leu Leu Glu Leu Gly Arg Gly Val Leu Asp Gin Gin 





195 




200 




205 




GTATGGTGGA 


AACACACTTC 


TGCCCCTATA 


CTCTAGTGGC 


AGAGTGGAGG 


AGGTTGCAGG 


4570 


GCACGGAATC 


CCTGGTTGGA 


GTTTCAGAGG 


TGGCTGAGGC 


TGTGTGCCTC 


TCCAAATTCT 


4630 


GGGAAGGGAC 


TTTCTCAATC 


CTAGAGTCTC 


TACCTTATAA 


TTGAGATGTA 


TGAGACAGCC 


4690 


ACAAGTCATG 


GGTTTAATTT 


CTTTTCTCCA 


TGCATATGGC 


TCAAAGGGAA 


GTGTCTATGG 


4750 


CCCTTGCTTT 


TTATTTAACC 


AATAATCTTT 


TGTATATTTA 


TACCTGTTAA 


AAATTCAGAA 


4810 


ATGTCAAGGC 


CGGGCACGGT 


GGCTCACCCC 


TGTAATCCCA 


GCACTTTGGG 


AGGCCGAGGC 


4870 


GGGTGGTCAC 


AAGGTCAGGA 


GTTTGAGACC 


AGCCTGACCA 


ACATGGTGAA 


ACCCGTCTCT 


4930 


AAAAAAATAC 


AAAAATTAGC 


TGGTCACAGT 


CATGCGCACC 


TGTAGTCCCA 


GCTAATTGGA 


4990 


AGGCTGAGGC 


AGGAGCATCG 


CTTGAACCTG 


GGAAGCGGAA 


GTTGCACTGA 


GCCAAGATCG 


5050 
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CGCCACTGCA 


CTCCAGCCTA 


GGCAGCAGAG 


TGAGACTCCA 


TCTTAAAAAA 


AAAAAAAAAA 


5110 


AAAAAAAGAG 


AATTCAGAGA 


TCTCAGCTAT 


CATATGAATA 


CCAGGACAAA 


ATATCAAGTG 


5170 


AGGCCACTTA 


TCAGAGTAGA 


AGAATCCTTT 


AGGTTAAAAG 


TTTCTTTCAT 


AGAACATAGC 


5230 


AATAATCACT 


GAAGCTACCT 


ATCTTACAAG 


TCCGCTTCTT 


ATAACAATGC 


CTCCTAGGTT 


5290 


GACCCAGGTG 


AAACTGACCA 


TCTGTATTCA 


ATCATTTTCA 


ATGCACATAA 


AGGGCAATTT 


5350 


TATCTATCAG 


AACAAAGAAC 


ATGGGTAACA 


GATATGTATA 


TTTACATGTG 


AGGAGAACAA 


5410 


GCTGATCTGA 


CTGCTCTCCA 


AGTGACACTG 


TGTTAGAGTC 


CAATCTTAGG 


ACACAAAATG 


5470 


GTGTCTCTCC 


TGTAGCTTGT 


TTTTTTCTGA 


AAAGGGTATT 


TCCTTCCTCC 


AACCTATAGA 


5530 


AGGAAGTGAA 


AGTTCCAGTC 


TTCCTGGCAA 


GGGTAAACAG 


ATCCCCTCTC 


CTCATCCTTC 


5590 


CTCTTTCCTG 


TCAAG TG CCT CCT TTG 
Val Pro Pro Leu 


GTG AAG GTG ACA CAT CAT GTG ACC 
Val Lys Val Thr His His Val Thr 


5640 



210 215 

TCT TCA GTG ACC ACT CTA CGG TGT CGG GCC TTG AAC TAC TAC CCC CAG 5688 
Ser Ser Val Thr Thr Leu Arg Cys Arg Ala Leu Asn Tyr Tyr Pro Gin 
220 225 230 

AAC ATC ACC ATG AAG TGG CTG AAG GAT AAG CAG CCA ATG GAT GCC AAG 5736 
Asn lie Thr Met Lys Trp Leu Lys Asp Lys Gin Pro Met Asp Ala Lys 
235 240 245 

GAG TTC GAA CCT AAA GAC GTA TTG CCC AAT GGG GAT GGG ACC TAC CAG 5784 
Glu Phe Glu Pro Lys Asp Val Leu Pro Asn Gly Asp Gly Thr Tyr Gin 
250 255 260 265 

GGC TGG ATA ACC TTG GCT GTA CCC CCT GGG GAA GAG CAG AGA TAT ACG 5832 
Gly Trp lie Thr Leu Ala Val Pro Pro Gly Glu Glu Gin Arg Tyr Thr 
270 275 280 

TAC CAG GTG GAG CAC CCA GGC CTG GAT CAG CCC CTC ATT GTG ATC TGG G 5881 
Tyr Gin Val Glu His Pro Gly Leu Asp Gin Pro Leu lie Val lie Trp 
285 290 295 

GTATGTGACT GATGAGAGCC AGGAGCTGAG AAAATCTATT GGGGGTTGAG AGGAGTGCCT 5941 

GAGGAGGTAA TTATGGCAGT GAGATGAGGA TCTGCTCTTT GTTAGGGGGT GGGCTGAGGG 6001 

TGGCAATCAA AGGCTTTAAC TTGCTTTTTC TGTTTTAG AG CCC TCA CCG TCT 6053 

Glu Pro Ser Pro Ser 
300 

GGC ACC CTA GTC ATT GGA GTC ATC AGT GGA ATT GCT GTT TTT GTC GTC 6101 
Gly Thr Leu Val lie Gly Val He Ser Gly He Ala Val Phe Val Val 
305 310 315 

ATC TTG TTC ATT GGA ATT TTG TTC ATA ATA TTA AGG AAG AGG CAG GGT 6149 
lie Leu Phe He Gly He Leu Phe He He Leu Arg Lys Arg Gin Gly 
320 325 330 

TCA A GTGAGTAGGA ACAAGGGGGA AGTCTCTTAG TACCTCTGCC CCAGGGCACA 6203 

Ser 

335 



16 

GTGGGAAGAG GGGCAGAGGG GATCTGGCAT CCATGGGAAG CATTTTTCTC ATTTATATTC 6263 

TTTGGGGACA CCAGCAGCTC CCTGGGAGAC AGAAAATAAT GGTTCTCCCC AGAATGAAAG 6323 

TCTCTAATTC AACAAACATC TTCAGAGCAC CTACTATTTT GCAAGAGCTG TTTAAGGTAG 6383 

TACAGGGGCT TTGAGGTTGA GAAGTCACTG TGGCTATTCT CAGAACCCAA ATCTGGTAGG 6443 

GAATGAAATT GATAGCAAGT AAATGTAGTT AAAGAAGACC CCATGAGGTC CTAAAGCAGG 6503 

CAGGAAGCAA ATGCTTAGGG TGTCAAAGGA AAGAATGATC ACATTCAGCT GGGGATCAAG 6563 

ATAGCCTTCT GGATCTTGAA GGAGAAGCTG GATTCCATTA GGTGAGGTTG AAGATGATGG 6623 

GAGGTCTACA CAGACGGAGC AACCATGCCA AGTAGGAGAG TATAAGGCAT ACTGGGAGAT 6683 

TAGAAATAAT TACTGTACCT TAACCCTGAG TTTGCGTAGC TATCACTCAC CAATTATGCA 6743 

TTTCTACCCC CTGAACATCT GTGGTGTAGG GAAAAGAGAA TCAGAAAGAA GCCAGCTCAT 6803 

ACAGAGTCCA AGGGTCTTTT GGGATATTGG GTTATGATCA CTGGGGTGTC ATTGAAGGAT 6863 

CCTAAGAAAG GAGGACCACG ATCTCCCTTA TATGGTGAAT GTGTTGTTAA GAAGTTAGAT 6923 

GAGAGGTGAG GAGACCAGTT AGAAAGCCAA TAAGCATTTC CAGATGAGAG ATAATGGTTC 6983 

TTGAAATCCA ATAGTGCCCA GGTCTAAATT GAGATGGGTG AATGAGGAAA ATAAGGAAGA 7043 

GAGAAGAGGC AAGATGGTGC CTAGGTTTGT GATGCCTCTT TCCTGGGTCT CTTGTCTCCA 7103 

CAG GA GGA GCC ATG GGG CAC TAG GTC TTA GCT GAA CGT GAG 7144 
Arg Gly Ala Met Gly His Tyr Val Leu Ala Glu Arg Glu 
340 345 

TGACACGCAG CCTGCAGACT CACTGTGGGA AGGAGACA7A ACTAGAGACT CAAAGAGGGA 7204 

GTGCATTTAT GAGCTCTTCA TGTTTCAGGA GAGAGTTGAA CCTAAACATA GAAATTGCCT 7264 

GACGAACTCC TTGATTTTAG CCTTCTCTGT TCATTTCCTC AAAAAGATTT CCCCATTTAG 7324 

GTTTCTGAGT TCCTGCATGC CGGTGATCCC TAGCTGTGAC CTCTCCCCTG GAACTGTCTC 7384 

TCATGAACCT CAAGCTGCAT CTAGAGGCTT CCTTCATTTC CTCCGTCACC TCAGAGACAT 7444 

ACACCTATGT CATTTCATTT CCTATTTTTG GAAGAGGACT CCTTAAATTT GGGGGACTTA 7504 

CATGATTCAT TTTAACATCT GAGAAAAGCT TTGAACCCTG GGACGTGGCT AGTCATAACC 7564 

TTACCAGATT TTTACACATG TATCTATGCA TTTTCTGGAC CCGTTCAACT TTTCCTTTGA 7624 

ATCCTCTCTC TGTGTTACCC AGTAACTCAT CTGTCACCAA GCCTTGGGGA TTCTTCCATC 7684 

TGATTGTGAT GTGAGTTGCA CAGCTATGAA GGCTGTACAC TGCACGAATG GAAGAGGCAC 7744 

CTGTCCCAGA AAAAGCATCA TGGCTATCTG TGGGTAGTAT GATGGGTGTT TTTAGCAGGT 7804 

AGGAGGCAAA TATCTTGAAA GGGGTTGTGA AGAGGTGTTT TTTCTAATTG GCATGAAGGT 7864 

GTCATACAGA TTTGCAAAGT TTAATGGTGC CTTCATTTGG GATGCTACTC TAGTATTCCA 7924 

GACCTGAAGA ATCACAATAA TTTTCTACCT GGTCTCTCCT TGTTCTGATA ATGAAAATTA 7984 
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TGATAAGGAT 


GATAA7VAGCA 


CTTACTTCGT 


GTCCGACTCT 


TCTGAGCACC 


TAPTTAPATG 

X X X XT X w 


8044 


CATTACTGCA 


TGCACTTCTT 


ACAATAATTC 

4^Vi>«^f^ X **** X X w 


TATGAGATAG 


GTAPTATTAT 

u XfXVa XX*LX X.f^X 


PPPPATTTPT 

V«v^^V—A.X X XV^X 


8104 


TTTTTAAATG 


AAGAAAGTGA 


AGTAGGCCGG 


GCACGGTGGC 


TCACGPPTGT 


AATCPCAGPA 


8164 


CTTTGGGAGG 


CCAAAGCGGG 


TGGATPACGA 


GGTPAGGAGA 


TPGAGAPPAT 


PPTGGPTAAP 


O X. *± 


ATGGTGAAAC 


CCCATCTCTA 


ATAAAAATAC 


AAAAAATTAG 

fUWUV^X XSW3 


PTGGGPGTGG 


TGGP AG APG P 


R9fi4 


CTGTAGTCCC 


AGCTACTCGG 


AAGGPTGAGG 


PAGGAGAATG 


GPATGAAPPP 


AfJG ACIGP AHA 


o ^*±*± 


GCTTGCAGTG 


AGCCGAGTTT 


GPGPPAPTGP 


APTPPAGPPT 


AnnTGi Ap An A 


nTf?AriAPTPp 


O *X V) 1 


ATCTCAAAAA 


AATAAAAATA 


AAAATAAAAA 


A ATG A AAA AA 


AAA AG A A AGT 


GAAGTATACiA 


O ** D 


GTATCTCATA 


GTTTGTCAGT 


GATAGAAACA 


GGTTTCAAAC 


TPAGTPAATP 


TGAPPGTTTG 


8524 


ATACATCTCA 


GACACCACTA 


CATTPAGTAG 


TTTAGATGPP 
XXX JvJj^ X vjrv— ^ 


TAG A AT A AAT 


AG AGA AGG A A 


o ^ o *± 


GGAGATGGCT 


CTTCTCTTGT 

X^ ^ X X X X X 


CTPATTGTGT 

Xp» X \«^-** X X VJ X X 


TTCTTCTGAG 

X X Np« X X Vp> X \JJ^\J 


TGAGPTTGAA 


TPAPATGAAG 


8644 


GGGAACAGCA 


GAAAACAACC 


AACTGATCCT 


CAGCTGTPAT 

^*^Vj X VJ X ^^^^ X 


GTTTPPTTTA 

\J XXX X X x<^ 


AAAGTPPPTG 


8704 


AAGGAAGGTC 


CTGGAATGTG 


ACTCCCTTGC 


TPCTPTGTTG 

X w v« X X yj X X \J 


PT PTPTTTGG 

V^XWXV^X X X 


PATTPATTTP 


8764 


TTTGGACCCT 


ACGCAAGGAC 


TGTAATTGGT 


GGGGACAGPT 


AGTGGPPPTG 


PTGGGPTTPA 


8824 


CACACGGTGT 


CCTCCCTAGG 


CCAGTGCCTC 


TGGAGTCAGA 


ACTCTGGTGG 


TATTTPPCTC 

X«^X X X^^^^^^X%*tf 


8884 


AATGAAGTGG 


AGTAAGCTCT 


PTPATTTTGA 

Vi- X v_><^ X X X X yjj-^ 


GATGGTATAA 


TGGAAGPCAP 


PAAGTRGPTT 


8944 


AGAGGATGCC 


CAGGTCCTTC? 


CATGGAGPPA 


PTGG^^GTT'PP 


GGTG PAP ATT 


AAAAAAAAAA 


9004 
J \j \j *± 




AdATTCARGA 


ATTGPTAGAT 

/%X XVJV^XxaOnX 


TPTfJnn A A A T 


P AnTTP A P PA 


TnTTPA A A an 


^ U D ■« 


AGTCTTTTTT 


TTTTTTTTGA 

XXXXXXXX 


GAPTPTATTG 

\JtW^ X X X% X X VJ 


PPPAGGPTGG 


AGTGPAATGG 


PATGATPTPG 


74 


GCTCArTGTA 


APPTPTfiPPT 

V- X ^ X VJV^\. X 


PPPARHTTPA 


ArtPHATTPTP 


PTnTPTr* a r3P 


PTPPPa a(^Ta 


Q1 P4 


GCTGGGATTA 




PPAPPATHPP 




X X Vjl/il X X X X 


A^tiTAna/^APa 


Q04 A 


GGGTTTCAPP 


ATf^TTf^TSPP A 


<tinPTnn'TPTr' 

X VjVJ X ^ X ^ 


0>iM^ X ^ X X 


X X vjrA 


X X OL.L' 




TrGGrpTprp 

X ^nJwV^Vv X \.»V.vV« 


A A AfiTOPTfi A 








riTr'a a a a/^a/^ 




X w X XxvxXnXA 


TATATPPana 




• 1 "1 'TV / "1" I 'TV T'/*' 
1 1 Av. 1 1 XAI^ 


X XACX/^wiXo 


(JAL. X X kiljU X \j 


OA OA 


CATAAATGTG 


GTAPAAfJPAT 


L\3 X 1 \3A 






pr'aTa'papart 


QA Q A 


CTCAGAAGTT 


TCTTCTTTAG 


GCATTAAATT 


TTAGCAAAGA 


TATCTCATCT 


CTTCTTTTAA 


9544 


ACCATTTTCT 


TTTTTTGTGG 


TTAGAAAAGT 


TATGTAGAAA 


AAAGTAAATG 


TGATTTACGC 


9604 


TCATTGTAGA 


AAAGCTATAA 


AATGAATACA 


ATTAAAGCTG 


TTATTTAATT 


AGCCAGTGAA 


9664 


AAACTATTAA 


CAACTTGTCT 


ATTACCTGTT 


AGTATTATTG 


TTGCATTAAA 


AATGCATATA 


9724 


CTTTAATAAA 


TGTATATTGT 


ATTGTATACT 


GCATGATTTT 


ATTGAAGTTC 


TTGTTCATCT 


9784 


TGTGTATATA 


CTTAATCGCT 


TTGTCATTTT 


GGAGACATTT 


ATTTTGCTTC 


TAATTTCTTT 


9844 
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ACATTTTGTC 


TTACGGAATA 


TTTTCATTCA 


ACTGTGGTAG 


CCGAATTAAT 


CGTGTTTCTT 


9904 


CACTCTAGGG 


ACATTGTCGT 


CTAAGTTGTA 


AGACATTGGT 


TATTTTACCA 


GCAAACCATT 


9964 


CTGAAAGCAT 


ATGACAAATT 


ATTTCTCTCT 


TAATATCTTA 


CTATACTGAA 


AGCAGACTGC 


10024 


TATAAGGCTT 


CACTTACTCT 


TCTACCTCAT 


AAGGAATATG 


TTACAATTAA 


TTTATTAGGT 


10084 


AAGCATTTGT 


TTTATATTGG 


TTTTATTTCA 


CCTGGGCTGA 


GATTTCAAGA 


AACACCCCAG 


10144 


TCTTCACAGT 


AACACATTTC 


ACTAACACAT 


TTACTAAACA 


TCAGCAACTG 


TGGCCTGTTA 


10204 


ATTTTTTTAA 


TAGAAATTTT 


AAGTCCTCAT 


TTTCTTTCGG 


TGTTTTTTAA 


GCTTAATTTT 


10264 


TCTGGCTTTA 


TTCATAAATT 


CTTAAGGTCA 


ACTACATTTG 


AAAAATCAAA 


GACCTGCATT 


10324 


TTAAATTCTT 


ATTCACCTCT 


GGCAAAACCA 


TTCACAAACC 


ATGGTAGTAA 


AGAGAAGGGT 


10384 


GACACCTGGT 


GGCCATAGGT 


AAATGTACCA 


CGGTGGTCCG 


GTGACCAGAG 


ATGCAGCGCT 


10444 


GAGGGTTTTC 


CTGAAGGTAA 


AGGAATAAAG 


AATGGGTGGA 


GGGGCGTGCA 


CTGGAAATCA 


10504 


CTTGTAGAGA 


AAAGCCCCTG 


AAAATTTGAG 


AAAACAAACA 


AGAAACTACT 


TACCAGCTAT 


10564 


TTGAATTGCT 


GGAATCACAG 


GCCATTGCTG 


AGCTGCCTGA 


ACTGGGAACA 


CAACAGAAGG 


10624 


AAAACAAACC 


ACTCTGATAA 


TCATTGAGTC 


AAGTACAGCA 


GGTGATTGAG 


GACTGCTGAG 


10684 


AGGTACAGGC 


CAAAATTCTT 


ATGTTGTATT 


ATAATAATGT 


CATCTTATAA 


TACTGTCAGT 


10744 


ATTTTATAAA 


ACATTCTTCA 


CAAACTCACA 


CACATTTAAA 


AACAAAACAC 


TGTCTCTAAA 


10804 


ATCCCCAAAT 


TTTTCATAAA 


C 








10825 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 348 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

Met Gly Pro Arg Ala Arg Pro Ala Leu Leu Leu Leu Met Leu Leu Gin 
15 10 15 

Thr Ala Val Leu Gin Gly Arg Leu Leu Arg Ser His Ser Leu His Tyr 
20 25 30 

Leu Phe Met Gly Ala Ser Glu Gin Asp Leu Gly Leu Ser Leu Phe Glu 
35 40 45 

Ala Leu Gly Tyr Val Asp Asp Gin Leu Phe Val Phe Tyr Asp His Glu 
50 55 60 

Ser Arg Arg Val Glu Pro Arg Thr Pro Trp Val Ser Ser Arg lie Ser 
65 70 75 80 

Ser Gin Met Trp Leu Gin Leu Ser Gin Ser Leu Lys Gly Trp Asp His 
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85 90 95 

Met Phe Thr Val Asp Phe Trp Thr lie Met Glu Asn His Asn His Ser 
100 105 110 

Lys Glu Ser His Thr Leu Gin Val lie Leu Gly Cys Glu Met Gin Glu 
115 120 125 

Asp Asn Ser Thr Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly Gin Asp 
130 135 140 

His Leu Glu Phe Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala Glu Pro 
145 150 155 160 

Arg Ala Trp Pro Thr Lys Leu Glu Trp Glu Arg His Lys lie Arg Ala 
165 170 175 

Arg Gin Asn Arg Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin Leu Gin 
180 185 190 

Gin Leu Leu Glu Leu Gly Arg Gly Val Leu Asp Gin Gin Val Pro Pro 
195 200 205 

Leu Val Lys Val Thr His His Val Thr Ser Ser Val Thr Thr Leu Arg 
210 215 220 

Cys Arg Ala Leu Asn Tyr Tyr Pro Gin Asn He Thr Met Lys Trp Leu 
225 230 235 240 

Lys Asp Lys Gin Pro Met Asp Ala Lys Glu Phe Glu Pro Lys Asp Val 
245 250 255 

Leu Pro Asn Gly Asp Gly Thr Tyr Gin Gly Trp He Thr Leu Ala Val 
260 265 270 

Pro Pro Gly Glu Glu Gin Arg Tyr Thr Tyr Gin Val Glu His Pro Gly 
275 280 285 

Leu Asp Gin Pro Leu He Val lie Trp Glu Pro Ser Pro Ser Gly Thr 
290 295 300 

Leu Val He Gly Val He Ser Gly He Ala Val Phe Val Val He Leu 
305 310 315 320 

Phe He Gly He Leu Phe He He Leu Arg Lys Arg Gin Gly Ser Arg 
325 330 335 

Gly Ala Met Gly His Tyr Val Leu Ala Glu Arg Glu 
340 345 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10825 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: join { 361 .. 436 , 3762.. 4025, 4235.. 4510, 5606.. 5881, 

6040.. 6153, 7107.. 7147) 
(D) OTHER INFORMATION: /product = "Hereditary Hemochromatosis 

(HH) protein containing the 24d2 
mutation" 

/note= "Hereditary Hemochromatosis (HH) 
gene 24d2 allele" 

(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 140., 7319 

(D) OTHER INFORMATION: /note= "start and stop positions for 

24d2 allele cDNA (SEQ ID N0:11)" 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 3852.. 3891 

(D) OTHER INFORMATION: /note= "start and stop positions for 

genomic sequence surrounding variant 
for 24d2(G) allele (SEQ ID N0:42) " 

(ix) FEATURE: 

(A) NAME /KEY: - 

(B) LOCATION: 5507.. 6023 

(D) OTHER INFORMATION: /note= "start and stop positions for 

genomic sequence surrounding variant 
for 24dl{G) allele (SEQ ID NO:20) " 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (3872 , "g") 

(D) OTHER INFORMATION: /phenotype= "Hereditary Hemochromatosis 

(HH) " 

/label= 24d2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



TCTAAGGTTG AGATAAAATT TTTAAATGTA 


TGATTGAATT 


TTGAAAATCA 


TAAATATTTA 


60 


AATATCTAAA GTTCAGATCA GAACATTGCG 


AAGCTACTTT 


CCCCAATCAA 


CAACACCCCT 


120 


TCAGGATTTA AAAACCAAGG GGGACACTGG 


ATCACCTAGT 


GTTTCACAAG 


CAGGTACCTT 


180 


CTGCTGTAGG AGAGAGAGAA CTAAAGTTCT 


GAAAGACCTG 


TTGCTTTTCA 


CCAGGAAGTT 


240 


TTACTGGGCA TCTCCTGAGC CTAGGCAATA 


GCTGTAGGGT 


GACTTCTGGA 


GCCATCCCCG 


300 


TTTCCCCGCC CCCCAAAAGA AGCGGAGATT 


TAACGGGGAC 


GTGCGGCCAG 


AGCTGGGGAA 


360 


ATG GGC CCG CGA GCC AGG CCG GCG CTT CTC CTC 
Met Gly Pro Arg Ala Arg Pro Ala Leu Leu Leu 


CTG ATG CTT TTG CAG 
Leu Met Leu Leu Gin 


408 



1 5 10 15 



ACC GCG GTC CTG CAG GGG CGC TTG CTG C GTGAGTCCGA GGGCTGCGGG 456 
Thr Ala Val Leu Gin Gly Arg Leu Leu 
20 25 

CGAACTAGGG GCGCGGCGGG GGTGGAAAAA TCGAAACTAG CTTTTTCTTT GCGCTTGGGA 516 
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GTTTGCTAAC TTTGGAGGAC CTGCTCAACC CTATCCGCAA GCCCCTCTCC CTACTTTCTG 576 

CGTCCAGACC CCGTGAGGGA GTGCCTACCA CTGAACTGCA GATAGGGGTC CCTCGCCCCA 636 

GGACCTGCCC CCTCCCCCGG CTGTCCCGGC TCTGCGGAGT GACTTTTGGA ACCGCCCACT 696 

CCCTTCCCCC AACTAGAATG CTTTTAAATA AATCTCGTAG TTCCTCACTT GAGCTGAGCT 756 

AAGCCTGGGG CTCCTTGAAC CTGGAACTCG GGTTTATTTC CAATGTCAGC TGTGCAGTTT 816 

TTTCCCCAGT CATCTCCAAA CAGGAAGTTC TTCCCTGAGT GCTTGCCGAG AAGGCTGAGC 876 

AAACCCACAG CAGGATCCGC ACGGGGTTTC CACCTCAGAA CGAATGCGTT GGGCGGTGGG 936 

GGCGCGAAAG AGTGGCGTTG GGGATCTGAA TTCTTCACCA TTCCACCCAC TTTTGGTGAG 996 

ACCTGGGGTG GAGGTCTCTA GGGTGGGAGG CTCCTGAGAG AGGCCTACCT CGGGCCTTTC 1056 

CCCACTCTTG GCAATTGTTC TTTTGCCTGG AAAATTAAGT ATATGTTAGT TTTGAACGTT 1116 

TGAACTGAAC AATTCTCTTT TCGGCTAGGC TTTATTGATT TGCAATGTGC TGTGTAATTA 1176 

AGAGGCCTCT CTACAAAGTA CTGATAATGA ACATGTAAGC AATGCACTCA CTTCTAAGTT 1236 

ACATTCATAT CTGATCTTAT TTGATTTTCA CTAGGCATAG GGAGGTAGGA GCTAATAATA 1296 

CGTTTATTTT ACTAGAAGTT AACTGGAATT CAGATTATAT AACTCTTTTC AGGTTACAAA 1356 

GAACATAAAT AATCTGGTTT TCTGATGTTA TTTCAAGTAC TACAGCTGCT TCTAATCTTA 1416 

GTTGACAGTG ATTTTGCCCT GTAGTGTAGC ACAGTGTTCT GTGGGTCACA CGCCGGCCTC 1476 

AGCACAGCAC TTTGAGTTTT GGTACTACGT GTATCCACAT TTTACACATG ACAAGAATGA 1536 

6GCATGGCAC GGCCTGCTTC CTGGCAAATT TATTCAATGG TACACTGGGC TTTGGTGGCA 1596 

GAGCTCATGT CTCCACTTCA TAGCTATGAT TCTTAAACAT CACACTGCAT TAGAGGTTGA 1656 

ATAATAAAAT TTCATGTTGA GCAGAAATAT TCATTGTTTA CAAGTGTAAA TGAGTCCCAG 1716 

CCATGTGTTG CACTGTTCAA GCCCCAAGGG AGAGAGCAGG GAAACAAGTC TTTACCCTTT 1776 

GATATTTTGC ATTCTAGTGG GAGAGATGAC AATAAGCAAA TGAGCAGAAA GATATACAAC 1836 

ATCAGGAAAT CATGGGTGTT GTGAGAAGCA GAGAAGTCAG GGCAAGTCAC TCTGGGGCTG 1896 

ACACTTGAGC AGAGACATGA AGGAAATAAG AATGATATTG ACTGGGAGCA GTATTTCCCA 1956 

GGCAAACTGA GTGGGCCTGG CAAGTTGGAT TAAAAAGCGG GTTTTCTCAG CACTACTCAT 2016 

GTGTGTGTGT GTGGGGGGGG GGGGCGGCGT GGGGGTGGGA AGGGGGACTA CCATCTGCAT 2076 

GTAGGATGTC TAGCAGTATC CTGTCCTCCC TACTCACTAG GTGCTAGGAG CACTCCCCCA 2136 

GTCTTGACAA CCAAAAATGT CTCTAAACTT TGCCACATGT CACCTAGTAG ACA7VACTCCT 2196 

GGTTAAGAAG CTCGGGTTGA AAAAAATAAA CAAGTAGTGC TGGGGAGTAG AGGCCAAGAA 2256 

GTAGGTAATG GGCTCAGAAG AGGAGCCACA AACAAGGTTG TGCAGGCGCC TGTAGGCTGT 2316 

GGTGTGAATT CTAGCCAAGG AGTAACAGTG ATCTGTCACA GGCTTTTAAA AGATTGCTCT 2376 
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GGCTGCTATG 


TGGAAAGCAG 


AATGAAGGGA 


GCAACAGTAA 


AAGCAGGGAG 


CCCAGCCAGG 


2436 


AAGCTGTTAC 


ACAGTCCAGG 


CAAGAGGTAG 


TGGAGTGGGC 


TGGGTGGGAA 


CAGAAAAGGG 


2496 


AGTGACAAAC 


CATTGTCTCC 


TGAATATATT 


CTGAAGGAAG 


TTGCTGAAGG 


ATTCTATGTT 


2556 


GTGTGAGAGA 


AAGAGAAGAA 


TTGGCTGGGT 


GTAGTAGCTC 


ATGCCAAGGA 


GGAGGCCAAG 


2616 


GAGAGCAGAT 


TCCTGAGCTC 


AGGAGTTCAA 


GACCAGCCTG 


GGCAACACAG 


CAAAACCCCT 


2676 


TCTCTACAAA 


AAATACAAAA 


ATTAGCTGGG 


TGTGGTGGCA 


TGCACCTGTG 


ATCCTAGCTA 


2736 


CTCGGGAGGC 


TGAGGTGGAG 


GGTATTGCTT 


GAGCCCAGGA 


AGTTGAGGCT 


GCAGTGAGCC 


2796 


ATGACTGTGC 


CACTGTACTT 


CAGCCTAGGT 


GACAGAGCAA 


GACCCTGTCT 


CCCCTGACCC 


2856 


CCTGAAAAAG 


AGAAGAGTTA 


AAGTTGACTT 


TGTTCTTTAT 


TTTAATTTTA 


TTGGCCTGAG 


2916 


CAGTGGGGTA 


ATTGGCAATG 


CCATTTCTGA 


GATGGTGAAG 


GCAGAGGAAA 


GAGCAGTTTG 


2976 


GGGTAAATCA 


AGGATCTGCA 


TTTGGGACAT 


GTTAAGTTTG 


AGATTCCAGT 


CAGGCTTCCA 


3036 


AGTGGTGAGG 


CCACATAGGC 


AGTTCAGTGT 


AAGAATTCAG 


GACCAAGGCT 


GGGCACGGTG 


3096 


GCTCACTTCT 


GTAATCCCAG 


CACTTTGGTG 


GCTGAGGCAG 


GTAGATCATT 


TGAGGTCAGG 


3156 


AGTTTGAGAC 


AAGCTTGGCC 


AACATGGTGA 


AACCCCATGT 


CTACTAAAAA 


TACAAAAATT 


3216 


AGCCTGGTGT 


GGTGGCGCAC 


GCCTATAGTC 


CCAGGTTTTC 


AGGAGGCTTA 


GGTAGGAGAA 


3276 


TCCCTTGAAC 


CCAGGAGGTG 


CAGGTTGCAG 


TGAGCTGAGA 


TTGTGCCACT 


GCACTCCAGC 


3336 


CTGGGTGATA 


GAGTGAGACT 


CTGTCTCAAA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAAAACTGA 


3396 


AGGAATTATT 


CCTCAGGATT 


TGGGTCTAAT 


TTGCCCTGAG 


CACCAACTCC 


TGAGTTCAAC 


3456 


TACCATGGCT 


AGACACACCT 


TAACATTTTC 


TAGAATCCAC 


CAGCTTTAGT 


GGAGTCTGTC 


3516 


TAATCATGAG 


TATTGGAATA 


GGATCTGGGG 


GCAGTGAGGG 


GGTGGCAGCC 


ACGTGTGGCA 


3576 


GAGAAAAGCA 


CACAAGGAAA 


GAGCACCCAG 


GACTGTCATA 


TGGAAGAAAG 


ACAGGACTGC 


3636 


AACTCACCCT 


TCACAAAATG 


AGGACCAGAC 


ACAGCTGATG 


GTATGAGTTG 


ATGCAGGTGT 


3696 


GTGGAGCCTC 


AACATCCTGC 


TCCCCTCCTA 


CTACACATGG 


TTAAGGCCTG 


TTGCTCTGTC 


3756 


TCCAG GT TCA CAC TCT 
Arg Ser His Ser 


CTG CAC TAG CTC TTC ATG GGT GCC TCA GAG 
Leu His Tyr Leu Phe Met Gly Ala Ser Glu 


3802 



30 35 



CAG GAC CTT GGT CTT TCC TTG TTT GAA GCT TTG GGC TAG GTG GAT GAC 3850 
Gin Asp Leu Gly Leu Ser Leu Phe Glu Ala Leu Gly Tyr Val Asp Asp 
40 45 50 55 

CAG CTG TTC GTG TTC TAT GAT GAT GAG AGT CGC CGT GTG GAG CCC CGA 3898 
Gin Leu Phe Val Phe Tyr Asp Asp Glu Ser Arg Arg Val Glu Pro Arg 
60 65 70 

ACT CCA TGG GTT TCC AGT AGA ATT TCA AGC CAG ATG TGG CTG CAG CTG 3946 
Thr Pro Trp Val Ser Ser Arg lie Ser Ser Gin Met Trp Leu Gin Leu 
75 80 85 
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AGT CAG AGT CTG AAA GGG TGG GAT CAC ATG TTC ACT GTT GAG TTC TGG 3994 
Ser Gin Ser Leu Lys Gly Trp Asp His Met Phe Thr Val Asp Phe Trp 
90 95 100 

ACT ATT ATG GAA AAT CAC AAC CAC AGC AAG G GTATGTGGAG AGGGGGCCTC 4045 
Thr lie Met Glu Asn His Asn His Ser Lys 
105 110 

ACCTTCCTGA GGTTGTCAGA GCTTTTCATC TTTTCATGCA TCTTGAAGGA AACAGCTGGA 4105 

AGTCTGAGGT CTTGTGGGAG CAGGGAAGAG GGAAGGAATT TGCTTCCTGA GATCATTTGG 4165 

TCCTTGGGGA TGGTGGAAAT AGGGACCTAT TCCTTTGGTT GCAGTTAACA AGGCTGGGGA 4225 

TTTTTCCAG AG TCC CAC ACC CTG CAG GTC ATC CTG GGC TGT GAA ATG 4272 
Glu Ser His Thr Leu Gin Val lie Leu Gly Cys Glu Met 
115 120 125 

CAA GAA GAC AAC AGT ACC GAG GGC TAC TGG AAG TAC GGG TAT GAT GGG 432 0 

Gin Glu Asp Asn Ser Thr Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly 
130 135 140 

CAG GAC CAC CTT GAA TTC TGC CCT GAC ACA CTG GAT TGG AGA GCA GCA 4368 
Gin Asp His Leu Glu Phe Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala 
145 150 155 

GAA CCC AGG GCC TGG CCC ACC AAG CTG GAG TGG GAA AGG CAC AAG ATT 4416 
Glu Pro Arg Ala Trp Pro Thr Lys Leu Glu Trp Glu Arg His Lys lie 
160 165 170 

CGG GCC AGG CAG AAC AGG GCC TAC CTG GAG AGG GAC TGC CCT GCA CAG 4464 
Arg Ala Arg Gin Asn Arg Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin 
175 180 185 190 

CTG CAG CAG TTG CTG GAG CTG GGG AGA GGT GTT TTG GAC CAA CAA G 4510 
Leu Gin Gin Leu Leu Glu Leu Gly Arg Gly Val Leu Asp Gin Gin 
195 200 205 



GTATGGTGGA 


AACACACTTC 


TGCCCCTATA 


CTCTAGTGGC 


AGAGTGGAGG 


AGGTTGCAGG 


4570 


GCACGGAATC 


CCTGGTTGGA 


GTTTCAGAGG 


TGGCTGAGGC 


TGTGTGCCTC 


TCCAAATTCT 


4630 


GGGAAGGGAC 


TTTCTCAATC 


CTAGAGTCTC 


TACCTTATAA 


TTGAGATGTA 


TGAGACAGCC 


4690 


ACAAGTCATG 


GGTTTAATTT 


CTTTTCTCCA 


TGCATATGGC 


TCAAAGGGAA 


GTGTCTATGG 


4750 


CCCTTGCTTT 


TTATTTAACC 


AATAATCTTT 


TGTATATTTA TACCTGTTAA 


AAATTCAGAA 


4810 


ATGTCAAGGC 


CGGGCACGGT 


GGCTCACCCC 


TGTAATCCCA 


GCACTTTGGG 


AGGCCGAGGC 


4870 


GGGTGGTCAC 


AAGGTCAGGA 


GTTTGAGACC 


AGCCTGACCA 


ACATGGTGAA 


ACCCGTCTCT 


4930 


AAAAAAATAC 


AAAAATTAGC 


TGGTCACAGT 


CATGCGCACC 


TGTAGTCCCA 


GCTAATTGGA 


4990 


AGGCTGAGGC 


AGGAGCATCG 


CTTGAACCTG 


GGAAGCGGAA 


GTTGCACTGA 


GCCAAGATCG 


5050 


CGCCACTGCA 


CTCCAGCCTA 


GGCAGCAGAG 


TGAGACTCCA 


TCTTAAAAAA 


AAAAAAAAAA 


5110 


AAAAAAAGAG 


AATTCAGAGA 


TCTCAGCTAT 


CATATGAATA 


CCAGGACAAA 


ATATCAAGTG 


5170 


AGGCCACTTA 


TCAGAGTAGA 


AGAATCCTTT 


AGGTTAAAAG 


TTTCTTTCAT 


AGAACATAGC 


5230 
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AATAATCACT 


GAAGCTACCT 


ATCTTACAAG 


TCCGCTTCTT 


ATAACAATGC 


CTCCTAGGTT 


5290 


GACCCAGGTG 


AAACTGACCA 


TCTGTATTCA 


ATCATTTTCA 


ATGCACATAA 


AGGGCAATTT 


5350 


TATCTATCAG 


AACAAAGAAC 


ATGGGTAACA 


GATATGTATA 


TTTACATGTG 


AGGAGAACAA 


5410 


GCTGATCTGA 


CTGCTCTCCA 


AGTGACACTG 


TGTTAGAGTC 


CAATCTTAGG 


ACACAAAATG 


5470 


GTGTCTCTCC 


TGTAGCTTGT 


TTTTTTCTGA 


AAAGGGTATT 


TCCTTCCTCC 


AACCTATAGA 


5530 


AGGAAGTGAA 


AGTTCCAGTC 


TTCCTGGCAA 


GGGTAAACAG 


ATCCCCTCTC 


CTCATCCTTC 


5590 


CTCTTTCCTG 


TCAAG TG CCT CCT TTG 
Val Pro Pro Leu 


GTG AAG GTG ACA CAT CAT GTG ACC 
Val Lys Val Thr His His Val Thr 


5640 



210 215 

TCT TCA GTG ACC ACT CTA CGG TGT CGG GCC TTG AAC TAC TAG CCC CAG 5688 
Ser Ser Val Thr Thr Leu Arg Cys Arg Ala Leu Asn Tyr Tyr Pro Gin 
220 225 230 

AAC ATC ACC ATG AAG TGG CTG AAG GAT AAG CAG CCA ATG GAT GCC AAG 5736 
Asn lie Thr Met Lys Trp Leu Lys Asp Lys Gin Pro Met Asp Ala Lys 
235 240 245 

GAG TTC GAA CCT AAA GAC GTA TTG CCC AAT GGG GAT GGG ACC TAC CAG 5784 
Glu Phe Glu Pro Lys Asp Val Leu Pro Asn Gly Asp Gly Thr Tyr Gin 
250 255 260 265 

GGC TGG ATA ACC TTG GCT GTA CCC CCT GGG GAA GAG CAG AGA TAT ACG 5832 
Gly Trp lie Thr Leu Ala Val Pro Pro Gly Glu Glu Gin Arg Tyr Thr 
270 275 280 

TGC CAG GTG GAG CAC CCA GGC CTG GAT CAG CCC CTC ATT GTG ATC TGG G 5881 
Cys Gin Val Glu His Pro Gly Leu Asp Gin Pro Leu lie Val lie Trp 
285 290 295 

GTATGTGACT GATGAGAGCC AGGAGCTGAG AAAATCTATT GGGGGTTGAG AGGAGTGCCT 5941 

GAGGAGGTAA TTATGGCAGT GAGATGAGGA TCTGCTCTTT GTTAGGGGGT GGGCTGAGGG 6001 

TGGCAATCAA AGGCTTTAAC TTGCTTTTTC TGTTTTAG AG CCC TCA CCG TCT 6053 

Glu Pro Ser Pro Ser 
300 

GGC ACC CTA GTC ATT GGA GTC ATC AGT GGA ATT GCT GTT TTT GTC GTC 6101 
Gly Thr Leu Val lie Gly Val lie Ser Gly lie Ala Val Phe Val Val 
305 310 _ 315 

ATC TTG TTC ATT GGA ATT TTG TTC ATA ATA TTA AGG AAG AGG CAG GGT 6149 
He Leu Phe He Gly He Leu Phe He He Leu Arg Lys Arg Gin Gly 
320 325 330 

TCA A GTGAGTAGGA ACAAGGGGGA AGTCTCTTAG TACCTCTGCC CCAGGGCACA 6203 

Ser 

335 

GTGGGAAGAG GGGCAGAGGG GATCTGGCAT CCATGGGAAG CATTTTTCTC ATTTATATTC 6263 

TTTGGGGACA CCAGCAGCTC CCTGGGAGAC AGAAAATAAT GGTTCTCCCC AGAATGAAAG 6323 

TCTCTAATTC AACAAACATC TTCAGAGCAC CTACTATTTT GCAAGAGCTG TTTAAGGTAG 6383 
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TACAGGGGCT TTGAGGTTGA GAAGTCACTG TGGCTATTCT CAGAACCCAA ATCTGGTAGG 6443 

GAATGAAATT GATAGCAAGT AAATGTAGTT AAAGAAGACC CCATGAGGTC CTAAAGCAGG 6503 

CAGGAAGCAA ATGCTTAGGG TGTCAAAGGA AAGAATGATC ACATTCAGCT GGGGATCAAG 6563 

ATAGCCTTCT GGATCTTGAA GGAGAAGCTG GATTCCATTA GGTGAGGTTG AAGATGATGG 6623 

GAGGTCTACA CAGACGGAGC AACCATGCCA AGTAGGAGAG TATAAGGCAT ACTGGGAGAT 6683 

TAGAAATAAT TACTGTACCT TAACCCTGAG TTTGCGTAGC TATCACTCAC CAATTATGCA 6743 

TTTCTACCCC CTGAACATCT GTGGTGTAGG GAAAAGAGAA TCAGAAAGAA GCCAGCTCAT 6803 

ACAGAGTCCA AGGGTCTTTT GGGATATTGG GTTATGATCA CTGGGGTGTC ATTGAAGGAT 6863 

CCTAAGAAAG GAGGACCACG ATCTCCCTTA TATGGTGAAT GTGTTGTTAA GAAGTTAGAT 6923 

GAGAGGTGAG GAGACCAGTT AGAAAGCCAA TAAGCATTTC CAGATGAGAG ATAATGGTTC 6983 

TTGAAATCCA ATAGTGCCCA GGTCTAAATT GAGATGGGTG AATGAGGAAA ATAAGGAAGA 7043 

GAGAAGAGGC AAGATGGTGC CTAGGTTTGT GATGCCTCTT TCCTGGGTCT CTTGTCTCCA 7103 

CAG GA GGA GCC ATG GGG CAC TAG GTC TTA GCT GAA CGT GAG 7144 
Arg Gly Ala Met Gly His Tyr Val Leu Ala Glu Arg Glu 
340 345 



TGACACGCAG 


CCTGCAGACT 


CACTGTGGGA 


AGGAGACAAA 


ACTAGAGACT 


CAAAGAGGGA 


7204 


GTGCATTTAT 


GAGCTCTTCA 


TGTTTCAGGA 


GAGAGTTGAA 


CCTAAACATA 


GAAATTGCCT 


7264 


GACGAACTCC 


TTGATTTTAG 


CCTTCTCTGT 


TCATTTCCTC 


AAAAAGATTT 


CCCCATTTAG 


7324 


GTTTCTGAGT 


TCCTGCATGC 


CGGTGATCCC 


TAGCTGTGAC 


CTCTCCCCTG 


GAACTGTCTC 


7384 


TCATGAACCT 


CAAGCTGCAT 


CTAGAGGCTT 


CCTTCATTTC 


CTCCGTCACC 


TCAGAGACAT 


7444 


ACACCTATGT 


CATTTCATTT 


CCTATTTTTG 


GAAGAGGACT 


CCTTAAATTT 


GGGGGACTTA 


7504 


CATGATTCAT 


TTTAACATCT 


GAGAAAAGCT 


TTGAACCCTG 


GGACGTGGCT 


AGTCATAACC 


7564 


TTACCAGATT 


TTTACACATG 


TATCTATGCA 


TTTTCTGGAC 


CCGTTCAACT 


TTTCCTTTGA 


7624 


ATCCTCTCTC 


TGTGTTACCC 


AGTAACTCAT 


CTGTCACCAA 


GCCTTGGGGA 


TTCTTCCATC 


7684 


TGATTGTGAT 


GTGAGTTGCA 


CAGCTATGAA 


GGCTGTACAC 


TGCACGAATG 


GAAGAGGCAC 


7744 


CTGTCCCAGA 


AAAAGCATCA 


TGGCTATCTG 


TGGGTAGTAT 


GATGGGTGTT 


TTTAGCAGGT 


7804 


AGGAGGCAAA 


TATCTTGAAA 


GGGGTTGTGA 


AGAGGTGTTT 


TTTCTAATTG 


GCATGAAGGT 


7864 


GTCATACAGA 


TTTGCAAAGT 


TTAATGGTGC 


CTTCATTTGG 


GATGCTACTC 


TAGTATTCCA 


7924 


GACCTGAAGA 


ATCACAATAA 


TTTTCTACCT 


GGTCTCTCCT 


TGTTCTGATA 


ATGAAAATTA 


7984 


TGATAAGGAT 


GATAAAAGCA 


CTTACTTCGT 


GTCCGACTCT 


TCTGAGCACC 


TACTTACATG 


8044 


CATTACTGCA 


TGCACTTCTT 


ACAATAATTC 


TATGAGATAG 


GTACTATTAT 


CCCCATTTCT 


8104 


TTTTTAAATG 


AAGAAAGTGA 


AGTAGGCCGG 


GCACGGTGGC 


TCACGCCTGT 


AATCCCAGCA 


8164 
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CTTTGGGAGG CCAAAGCGGG TGGATCACGA GGTCAGGAGA TCGAGACCAT CCTGGCTAAC 8224 

ATGGTGAAAC CCCATCTCTA ATAAAAATAC AAAAAATTAG CTGGGCGTGG TGGCAGACGC 8284 

CTGTAGTCCC AGCTACTCGG AAGGCTGAGG CAGGAGAATG GCATGAACCC AGGAGGCAGA 8344 

GCTTGCAGTG AGCCGAGTTT GCGCCACTGC ACTCCAGCCT AGGTGACAGA GTGAGACTCC 8404 

ATCTCAAAAA AATAAAAATA AAAATAAAAA AATGAAAAAA AAAAGAAAGT GAAGTATAGA 8464 

GTATCTCATA GTTTGTCAGT GATAGAAACA GGTTTCAAAC TCAGTCAATC TGACCGTTTG 8524 

ATACATCTCA GACACCACTA CATTCAGTAG TTTAGATGCC TAGAATAAAT AGAGAAGGAA 8584 

GGAGATGGCT CTTCTCTTGT CTCATTGTGT TTCTTCTGAG TGAGCTTGAA TCACATGAAG 8644 

GGGAACAGCA GAAAACAACC AACTGATCCT CAGCTGTCAT GTTTCCTTTA AAAGTCCCTG 8704 

AAGGAA6GTC CTGGAATGTG ACTCCCTTGC TCCTCTGTTG CTCTCTTTGG CATTCATTTC 8764 

TTTGGACCCT ACGCAAGGAC TGTAATTGGT GGGGACAGCT AGTGGCCCTG CTGGGCTTCA 8824 

CACACGGTGT CCTCCCTAGG CCAGTGCCTC TGGAGTCAGA ACTCTGGTGG TATTTCCCTC 8884 

AATGAAGTGG AGTAAGCTCT CTCATTTTGA GATGGTATAA TGGAAGCCAC CAAGTGGCTT 8944 

AGAGGATGCC CAGGTCCTTC CATGGAGCCA CTGGGGTTCC GGTGCACATT AAAAAAAAAA 9004 

TCTAACCAGG ACATTCAGGA ATTGCTAGAT TCTGGGAAAT CAGTTCACCA TGTTCAAAAG 9064 

AGTCTTTTTT TTTTTTTTGA GACTCTATTG CCCAGGCTGG AGTGCAATGG CATGATCTCG 9124 

GCTCACTGTA ACCTCTGCCT CCCAGGTTCA AGCGATTCTC CTGTCTCAGC CTCCCAAGTA 9184 

GCTGGGATTA CAGGCGTGCT^ CCACCATGCC CGGCTAATTT TTGTATTTTT AGTAGAGACA 9244 

GGGTTTCACC ATGTTGGCCA GGCTGGTCTC GAACTCTCCT GACCTCGTGA TCCGCCTGCC 9304 

TCGGCCTCCC AAAGTGCTGA GATTACAGGT GTGAGCCACC CTGCCCAGCC GTCAAAAGAG 9364 

TCTTAATATA TATATCCAGA TGGCATGTGT TTACTTTATG TTACTACATG CACTTGGCTG 9424 

CATAAATGTG GTACAAGCAT TCTGTCTTGA AGGGCAGGTG CTTCAGGATA CCATATACAG 9484 

CTCAGAAGTT TCTTCTTTAG GCATTAAATT TTAGCAAAGA TATCTCATCT CTTCTTTTAA 9544 

ACCATTTTCT TTTTTTGTGG TTAGAAAAGT TATGTAGAAA AAAGTAAATG TGATTTACGC 9604 

TCATTGTAGA AAAGCTATAA AATGAATACA ATTAAAGCTG TTATTTAATT AGCCAGTGAA 9664 

AAACTATTAA CAACTTGTCT ATTACCTGTT AGTATTATTG TTGCATTAAA AATGCATATA 9724 

CTTTAATAAA TGTATATTGT ATTGTATACT GCATGATTTT ATTGAAGTTC TTGTTCATCT 9784 

TGTGTATATA CTTAATCGCT TTGTCATTTT GGAGACATTT ATTTTGCTTC TAATTTCTTT 9844 

ACATTTTGTC TTACGGAATA TTTTCATTCA ACTGTGGTAG CCGAATTAAT CGTGTTTCTT 9904 

CACTCTAGGG ACATTGTCGT CTAAGTTGTA AGACATTGGT TATTTTACCA GCAAACCATT 9964 

CTGAAAGCAT ATGACAAATT ATTTCTCTCT TAATATCTTA CTATACTGAA AGCAGACTGC 10024 
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TATAAGGCTT 


CACTTACTCT 


TCTACCTCAT 


AAGGAATATG 


TTACAATTAA 


TTTATTAGGT 


10084 


AAGCATTTGT 


TTTATATTGG 


TTTTATTTCA 


CCTGGGCTGA 


GATTTCAAGA 


AACACCCCAG 


10144 


TCTTCACAGT 


AACACATTTC 


ACTAACACAT 


TTACTAAACA 


TCAGCAACTG 


TGGCCTGTTA 


10204 


ATTTTTTTAA 


TAGAAATTTT 


AAGTCCTCAT 


TTTCTTTCGG 


TGTTTTTTAA 


GCTTAATTTT 


10264 


TCTGGCTTTA 


TTCATAAATT 


CTTAAGGTCA 


ACTACATTTG 


AAAAATCAAA 


GACCTGCATT 


10324 


TTAAATTCTT 


ATTCACCTCT 


GGCAAAACCA 


TTCACAAACC 


ATGGTAGTAA 


AGAGAAGGGT 


10384 


GACACCTGGT 


GGCCATAGGT 


AAATGTACCA 


CGGTGGTCCG 


GTGACCAGAG 


ATGCAGCGCT 


10444 


GAGGGTTTTC 


CTGAAGGTT^ 


AGGAATAAAG 


AATGGGTGGA 


GGGGCGTGCA 


CTGGAAATCA 


10504 


CTTGTAGAGA 


AAAGCCCCTG 


AAAATTTGAG 


AAAACAAACA 


AGAAACTACT 


TACCAGCTAT 


10564 


TTGAATTGCT 


GGAATCACAG 


GCCATTGCTG 


AGCTGCCTGA 


ACTGGGAACA 


CAACAGAAGG 


10624 


AAAACAAACC 


ACTCTGATAA 


TCATTGAGTC 


AAGTACAGCA 


GGTGATTGAG 


GACTGCTGAG 


10684 


AGGTACAGGC 


CAAAATTCTT 


ATGTTGTATT 


ATAATAATGT 


CATCTTATAA 


TACTGTCAGT 


10744 


ATTTTATAAA 


ACATTCTTCA 


CAAACTCACA 


CACATTTAAA 


AACAAAACAC 


TGTCTCTAAA 


10804 


ATCCCCAAAT 


TTTTCATAAA 


C 








10825 



(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQXJENCE CHARACTERISTICS: 

(A) LENGTH: 348 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Met Gly Pro Arg Ala Arg Pro Ala Leu Leu Leu Leu Met Leu Leu Gin 
15 10 15 

Thr Ala Val Leu Gin Gly Arg Leu Leu Arg Ser His Ser Leu His Tyr 
20 25 30 

Leu Phe Met Gly Ala Ser Glu Gin Asp Leu Gly Leu Ser Leu Phe Glu 
35 40 45 

Ala Leu Gly Tyr Val Asp Asp Gin Leu Phe Val Phe Tyr Asp Asp Glu 
50 55 60 

Ser Arg Arg Val Glu Pro Arg Thr Pro Trp Val Ser Ser Arg lie Ser 
65 70 75 80 

Ser Gin Met Trp Leu Gin Leu Ser Gin Ser Leu Lys Gly Trp Asp His 
85 90 95 

Met Phe Thr Val Asp Phe Trp Thr lie Met Glu Asn His Asn His Ser 
100 105 110 

Lys Glu Ser His Thr Leu Gin Val lie Leu Gly Cys Glu Met Gin Glu 
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115 120 125 

Asp Asn Ser Thr Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly Gin Asp 
130 135 140 

His Leu Glu Phe Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala Glu Pro 
145 150 155 160 

Arg Ala Trp Pro Thr Lys Leu Glu Trp Glu Arg His Lys lie Arg Ala 
165 170 175 

Arg Gin Asn Arg Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin Leu Gin 
180 185 190 

Gin Leu Leu Glu Leu Gly Arg Gly Val Leu Asp Gin Gin Val Pro Pro 
195 200 205 

Leu Val Lys Val Thr His His Val Thr Ser Ser Val Thr Thr Leu Arg 
210 215 220 

Cys Arg Ala Leu Asn Tyr Tyr Pro Gin Asn lie Thr Met Lys Trp Leu 
225 230 235 240 

Lys Asp Lys Gin Pro Met Asp Ala Lys Glu Phe Glu Pro Lys Asp Val 
245 250 255 

Leu Pro Asn Gly Asp Gly Thr Tyr Gin Gly Trp lie Thr Leu Ala Val 
260 265 270 

Pro Pro Gly Glu Glu Gin Arg Tyr Thr Cys Gin Val Glu His Pro Gly 
275 280 285 

Leu Asp Gin Pro Leu lie Val lie Trp Glu Pro Ser Pro Ser Gly Thr 
290 295 300 

Leu Val He Gly Val He Ser Gly He Ala Val Phe Val Val lie Leu 
305 310 315 320 

Phe He Gly He Leu Phe He He Leu Arg Lys Arg Gin Gly Ser Arg 
325 330 335 

Gly Ala Met Gly His Tyr Val Leu Ala Glu Arg Glu 
340 345 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10825 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: join (361 436 , 3762.. 4025, 4235.. 4510, 5606.. 588 

6040. .6153, 7107. .7147) 
(D) OTHER INFORMATION: /product= "Hereditary Hemochromatosis 

(HH) protein containing both the 24dl 
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and 24d2 mutations" 

/note= "Hereditary Hemochromatosis (HH) 
gene containing a combination of both 
24dl and 24d2 alleles" 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 140.. 7319 

(D) OTHER INFORMATION: /note= "start and stop positions for 

cDNA containing a combination of both 
24dl and 24d2 alleles 
(SEQ ID NO: 12) " 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 3852.. 3891 

(D) OTHER INFORMATION: /note= "start and stop positions for 

genomic sequence surrounding variant 
for 24d2(G) allele (SEQ ID NO: 42)" 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 5507.. 6023 

(D) OTHER INFORMATION: /note= "start and stop positions for 

genomic sequence surrounding variant 
for 24dl(A) allele (SEQ ID NO: 21)" 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (3872 , "g") 

(D) OTHER INFORMATION: /phenotype= "Hereditary Hemochromatosis 

(HH) " 

/label= 24d2 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (5834 , "a") 

(D) OTHER INFORMATION: /phenotype= "Hereditary Hemochromatosis 

(HH) " 

/label= 24dl 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



TCTAAGGTTG AGATAAAATT TTTAAATGTA 


TGATTGAATT 


TTGAAAATCA 


TAAATATTTA 


60 


AATATCTAAA GTTCAGATCA GAACATTGCG 


AAGCTACTTir 


CCCCAATCAA 


CAACACCCCT 


120 


TCAGGATTTA AAAACCMlGG GGGACACTGG 


ATCACCTAGT 


GTTTCACAAG 


CAGGTACCTT 


180 


CTGCTGTAGG AGAGAGAGAA CTAAAGTTCT 


GAAAGACCTG 


TTGCTTTTCA 


CCAGGAAGTT 


240 


TTACTGGGCA TCTCCTGAGC CTAGGCAATA 


GCTGTAGGGT 


GACTTCTGGA 


GCCATCCCCG 


300 


TTTCCCCGCC CCCCAAAAGA AGCGGAGATT 


TAACGGGGAC 


GTGCGGCCAG 


AGCTGGGGAA 


360 


ATG GGC CCG CGA GCC AGG CCG GCG CTT CTC CTC 
Met Gly Pro Arg Ala Arg Pro Ala Leu Leu Leu 


CTG ATG CTT TTG CAG 
Leu Met Leu Leu Gin 


408 



1 5 10 15 



ACC GCG GTC CTG CAG GGG CGC TTG CTG C GTGAGTCCGA GGGCTGCGGG 
Thr Ala Val Leu Gin Gly Arg Leu Leu 



456 





20 




25 








CGAACTAGGG 


GCGCGGCGGG 


GGTGGAAAAA 


TCGAAACTAG 


CTTTTTCTTT 


GCGCTTGGGA 


516 


GTTTGCTAAC 


TTTGGAGGAC 


CTGCTCAACC 


CTATCCGCAA 


GCCCCTCTCC 


CTACTTTCTG 


576 


CGTCCAGACC 


CCGTGAGGGA 


GTGCCTACCA 


CTGAACTGCA 


GATAGGGGTC 


CCTCGCCCCA 


636 


GGACCTGCCC 


CCTCCCCCGG 


CTGTCCCGGC 


TCTGCGGAGT 


GACTTTTGGA 


ACCGCCCACT 


696 


CCCTTCCCCC 


AACTAGAATG 


CTTTTAAATA 


AATCTCGTAG 


TTCCTCACTT 


GAGCTGAGCT 


756 


AAGCCTGGGG 


CTCCTTGAAC 


CTGGAACTCG 


GGTTTATTTC 


CAATGTCAGC 


TGTGCAGTTT 


816 


TTTCCCCAGT 


CATCTCCAAA 


CAGGAAGTTC 


TTCCCTGAGT 


GCTTGCCGAG 


AAGGCTGAGC 


876 


AAACCCACAG 


CAGGATCCGC 


ACGGGGTTTC 


CACCTCAGAA 


CGAATGCGTT 


GGGCGGTGGG 


936 


GGCGCGAAAG 


AGTGGCGTTG 


GGGATCTGAA 


TTCTTCACCA 


TTCCACCCAC 


TTTTGGTGAG 


996 


ACCTGGGGTG 


GAGGTCTCTA 


GGGTGGGAGG 


CTCCTGAGAG 


AGGCCTACCT 


CGGGCCTTTC 


1056 


CCCACTCTTG 


GCAATTGTTC 


TTTTGCCTGG 


AAAATTAAGT 


ATATGTTAGT 


TTTGAACGTT 


1116 


TGAACTGAAC 


AATTCTCTTT 


TCGGCTAGGC 


TTTATTGATT 


TGCAATGTGC 


TGTGTAATTA 


1176 


AGAGGCCTCT 


CTACAAAGTA 


CTGATAATGA 


ACATGTAAGC 


AATGCACTCA 


CTTCTAAGTT 


1236 


ACATTCATAT 


CTGATCTTAT 


TTGATTTTCA 


CTAGGCATAG 


GGAGGTAGGA 


GCTAATAATA 


1296 


CGTTTATTTT 


ACTAGAAGTT 


AACTGGAATT 


CAGATTATAT 


AACTCTTTTC 


AGGTTACAAA 


1356 


GAACATAAAT 


AATCTGGTTT 


TCTGATGTTA 


TTTCAAGTAC 


TACAGCTGCT 


TCTAATCTTA 


1416 


GTTGACAGTG 


ATTTTGCCCT 


GTAGTGTAGC 


ACAGTGTTCT 


GTGGGTCACA 


CGCCGGCCTC 


1476 


AGCACAGCAC 


TTTGAGTTTT 


GGTACTACGT 


GTATCCACAT 


TTTACACATG 


ACAAGAATGA 


1536 


GGCATGGCAC 


GGCCTGCTTC 


CTGGCAAATT 


TATTCAATGG 


TACACTGGGC 


TTTGGTGGCA 


1596 


GAGCTCATGT 


CTCCACTTGA 


TAGCTATGAT 


TCTTAAACAT 


CACACTGCAT 


TAGAGGTTGA 


1656 


ATAATAAAAT 


TTCATGTTGA 


GCAGAAATAT 


TCATTGTTTA 


CAAGTGTAAA 


TGAGTCCCAG 


1716 


CCATGTGTTG 


CACTGTTCAA 


GCCCCAAGGG 


AGAGAGCAGG 


GAAACAAGTC 


TTTACCCTTT 


1776 


GATATTTTGC 


ATTCTAGTGG 


GAGAGATGAC 


AATAAGCAAA 


TGAGCAGAAA 


GATATACAAC 


1836 


ATCAGGAAAT 


CATGGGTGTT 


GTGAGAAGCA 


GAGAAGTCAG 


GGCAAGTCAC 


TCTGGGGCTG 


1896 


ACACTTGAGC 


AGAGACATGA 


AGGAAATAAG 


AATGATATTG 


ACTGGGAGCA 


GTATTTCCCA 


1956 


GGCAAACTGA 


GTGGGCCTGG 


CAAGTTGGAT 


TAAAAAGCGG 


GTTTTCTCAG 


CACTACTCAT 


2016 


GTGTGTGTGT 


GTGGGGGGGG 


GGGGCGGCGT 


GGGGGTGGGA 


AGGGGGACTA 


CCATCTGCAT 


2076 


GTAGGATGTC 


TAGCAGTATC 


CTGTCCTCCC 


TACTCACTAG 


GTGCTAGGAG 


CACTCCCCCA 


2136 


GTCTTGACAA 


CCAAAAATGT 


CTCTAAACTT 


TGCCACATGT 


CACCTAGTAG 


ACAAACTCCT 


2196 


GGTTAAGAAG 


CTCGGGTTGA 


AAAAAATAAA 


CAAGTAGTGC 


TGGGGAGTAG 


AGGCCAAGAA 


2256 
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GTAGGTAATG GGCTCAGAAG AGGAGCCACA AACAAGGTTG TGCAGGCGCC TGTAGGCTGT 2316 

GGTGTGAATT CTAGCCAAGG AGTAACAGTG ATCTGTCACA GGCTTTTAAA AGATTGCTCT 2376 

GGCTGCTATG TGGAAAGCAG AATGAAGGGA GCAACAGTAA AAGCAGGGAG CCCAGCCAGG 2436 

AAGCTGTTAC ACAGTCCAGG CAAGAGGTAG TGGAGTGGGC TGGGTGGGAA CAGAAAAGGG 2496 

AGTGACAAAC CATTGTCTCC TGAATATATT CTGAAGGAAG TTGCTGAAGG ATTCTATGTT 2556 

GTGTGAGAGA AAGAGAAGAA TTGGCTGGGT GTAGTAGCTC ATGCCAAGGA GGAGGCCAAG 2616 

GAGAGCAGAT TCCTGAGCTC AGGAGTTCAA GACCAGCCTG GGCAACACAG CAAAACCCCT 2676 

TCTCTACAAA AAATACAAAA ATTAGCTGGG TGTGGTGGCA TGCACCTGTG ATCCTAGCTA 2736 

CTCGGGAGGC TGAGGTGGAG GGTATTGCTT GAGCCCAGGA AGTTGAGGCT GCAGTGAGCC 2796 

ATGACTGTGC CACTGTACTT CAGCCTAGGT GACAGAGCAA GACCCTGTCT CCCCTGACCC 2856 

CCTGAAAAAG AGAAGAGTTA AAGTTGACTT TGTTCTTTAT TTTAATTTTA TTGGCCTGAG 2916 

CAGTGGGGTA ATTGGCAATG CCATTTCTGA GATGGTGAAG GCAGAGGAAA GAGCAGTTTG 2976 

GGGTAAATCA AGGATCTGCA TTTGGGACAT GTTAAGTTTG AGATTCCAGT CAGGCTTCCA 3036 

AGTGGTGAGG CCACATAGGC AGTTCAGTGT AAGAATTCAG GACCAAGGCT GGGCACGGTG 3096 

GCTCACTTCT GTAATCCCAG CACTTTGGTG GCTGAGGCAG GTAGATCATT TGAGGTCAGG 3156 

AGTTTGAGAC AAGCTTGGCC AACATGGTGA AACCCCATGT CTACTAAAAA TACAAAAATT 3216 

AGCCTGGTGT GGTGGCGCAC GCCTATAGTC CCAGGTTTTC AGGAGGCTTA GGTAGGAGAA 3276 

TCCCTTGAAC CCAGGAGGTG CAGGTTGCAG TGAGCTGAGA TTGTGCCACT GCACTCCAGC 3336 

CTGGGTGATA GAGTGAGACT CTGTCTCAAA AAAAAAAAAA AAAAAAAAAA TU^AAAACTGA 3396 

AGGAATTATT CCTCAGGATT TGGGTCTAAT TTGGCCTGAG CACCAACTCC TGAGTTCAAC 3456 

TACCATGGCT AGACACACCT TAACATTTTC TAGAATCCAC CAGCTTTAGT GGAGTCTGTC 3516 

TAATCATGAG TATTGGAATA GGATCTGGGG GCAGTGAGGG GGTGGCAGCC ACGTGTGGCA 3576 

GAGAAAAGCA CACAAGGAAA GAGCACCCAG GACTGTCATA TGGAAGAAAG ACAGGACTGC 3636 

AACTCACCCT TCACAAAATG AGGACCAGAC ACAGCTGATG GTATGAGTTG ATGCAGGTGT 3696 

GTGGAGCCTC AACATCCTGC TCCCCTCCTA CTACACATGG TTAAGGCCTG TTGCTCTGTC 3756 

TCCAG GT TCA CAC TCT CTG CAC TAC CTC TTC ATG GGT GCC TCA GAG 3802 
Arg Ser His Ser Leu His Tyr Leu Phe Met Gly Ala Ser Glu 
30 35 

CAG GAC CTT GGT CTT TCC TTG TTT GAA GCT TTG GGC TAC GTG GAT GAC 3850 
Gin Asp Leu Gly Leu Ser Leu Phe Glu Ala Leu Gly Tyr Val Asp Asp 
40 45 50 55 

CAG CTG TTC GTG TTC TAT GAT GAT GAG AGT CGC CGT GTG GAG CCC CGA 3898 
Gin Leu Phe Val Phe Tyr Asp Asp Glu Ser Arg Arg Val Glu Pro Arg 
60 65 70 
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ACT CCA TGG GTT TCC AGT AGA ATT TCA AGC CAG ATG TGG CTG CAG CTG 3946 
Thr Pro Trp Val Ser Ser Arg lie Ser Ser Gin Met Trp Leu Gin Leu 
75 80 85 

AGT CAG AGT CTG AAA GGG TGG GAT CAC ATG TTC ACT GTT GAC TTC TGG 3994 
Ser Gin Ser Leu Lys Gly Trp Asp His Met Phe Thr Val Asp Phe Trp 
90 95 100 

ACT ATT ATG GAA AAT CAC AAC CAC AGC AAG G GTATGTGGAG AGGGGGCCTC 4045 
Thr He Met Glu Asn His Asn His Ser Lys 
105 110 

ACCTTCCTGA GGTTGTCAGA GCTTTTCATC TTTTCATGCA TCTTGAAGGA AACAGCTGGA 4105 

AGTCTGAGGT CTTGTGGGAG CAGGGAAGAG GGAAGGAATT TGCTTCCTGA GATCATTTGG 4165 

TCCTTGGGGA TGGTGGAAAT AGGGACCTAT TCCTTTGGTT GCAGTTAACA AGGCTGGGGA 4225 

TTTTTCCAG AG TCC CAC ACC CTG CAG GTC ATC CTG GGC TGT GAA ATG 4272 
Glu Ser His Thr Leu Gin Val He Leu Gly Cys Glu Met 
115 120 125 

CAA GAA GAC AAC AGT ACC GAG GGC TAC TGG AAG TAC GGG TAT GAT GGG 4320 
Gin Glu Asp Asn Ser Thr Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly 
130 135 140 

CAG GAC CAC CTT GAA TTC TGC CCT GAC ACA CTG GAT TGG AGA GCA GCA 4368 
Gin Asp His Leu Glu Phe Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala 
145 150 155 

GAA CCC AGG GCC TGG CCC ACC AAG CTG GAG TGG GAA AGG CAC AAG ATT 4416 
Glu Pro Arg Ala Trp Pro Thr Lys Leu Glu Trp Glu Arg His Lys He 
160 165 170 

CGG GCC AGG CAG AAC AGG GCC TAC CTG GAG AGG GAC TGC CCT GCA CAG 4464 
Arg Ala Arg Gin Asn Arg Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin 
175 ISO 185 190 

CTG CAG CAG TTG CTG GAG CTG GGG AGA GGT GTT TTG GAC CAA CAA G 4510 
Leu Gin Gin Leu Leu Glu Leu Gly Arg Gly Val Leu Asp Gin Gin 
195 200 205 



GTATGGTGGA 


AACACACTTC 


TGCCCCTATA 


CTCTAGTGGC 


AGAGTGGAGG 


AGGTTGCAGG 


4570 


GCACGGAATC 


CCTGGTTGGA 


GTTTCAGAGG 


TGGCTGAGGC 


TGTGTGCCTC 


TCCAAATTCT 


4630 


GGGAAGGGAC 


TTTCTCAATC 


CTAGAGTCTC 


TACCTTATAA 


TTGAGATGTA 


TGAGACAGCC 


4690 


ACAAGTCATG 


GGTTTAATTT 


CTTTTCTCCA 


TGCATATGGC 


TCAAAGGGAA 


GTGTCTATGG 


4750 


CCCTTGCTTT 


TTATTTAACC 


AATAATCTTT 


TGTATATTTA 


TACCTGTTAA 


AAATTCAGAA 


4810 


ATGTCAAGGC 


CGGGCACGGT 


GGCTCACCCC 


TGTAATCCCA 


GCACTTTGGG 


AGGCCGAGGC 


4870 


GGGTGGTCAC 


AAGGTCAGGA 


GTTTGAGACC 


AGCCTGACCA 


ACATGGTGAA 


ACCCGTCTCT 


4930 


AAAAAAATAC 


AAAAATTAGC 


TGGTCACAGT 


CATGCGCACC 


TGTAGTCCCA 


GCTAATTGGA 


4990 


AGGCTGAGGC 


AGGAGCATCG 


CTTGAACCTG 


GGAAGCGGAA 


GTTGCACTGA 


GCCAAGATCG 


5050 


CGCCACTGCA 


CTCCAGCCTA 


GGCAGCAGAG 


TGAGACTCCA 


TCTTAAATIAA 


AAAAAAAAAA 


5110 
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AAAAAAAGAG 


AATTCAGAGA 


TCTCAGCTAT 


CATATGAATA 


CCAGGACAAA 


ATATCAAGTG 


5170 


AGGCCACTTA 


TCAGAGTAGA 


AGAATCCTTT 


AGGTTAAAAG 


TTTCTTTCAT 


AGAACATAGC 


5230 


AATAATCACT 


GAAGCTACCT 


ATCTTAC7VAG 


TCCGCTTCTT 


ATAACAATGC 


CTCCTAGGTT 


5290 


GACCCAGGTG 


AAACTGACCA 


TCTGTATTCA 


ATCATTTTCA 


ATGCACATAA 


AGGGCAATTT 


5350 


TATCTATCAG 


AACAAAGAAC 


ATGGGTAACA 


GATATGTATA 


TTTACATGTG 


AGGAGAACAA 


5410 


GCTGATCTGA 


CTGCTCTCCA 


AGTGACACTG 


TGTTAGAGTC 


CAATCTTAGG 


ACACAAAATG 


5470 


GTGTCTCTCC 


TGTAGCTTGT 


TTTTTTCTGA 


AAAGGGTATT 


TCCTTCCTCC 


AACCTATAGA 


5530 


AGGAAGTGAA 


AGTTCCAGTC 


TTCCTGGCAA 


GGGTAAACAG 


ATCCCCTCTC 


CTCATCCTTC 


5590 


CTCTTTCCTG 


TCAAG TG CCT CCT TTG 
Val Pro Pro Leu 


GTG AAG GTG ACA CAT CAT GTG ACC 
Val Lys Val Thr His His Val Thr 


5640 



210 215 

TGT TCA GTG ACC ACT CTA CGG TGT CGG GCC TTG AAC TAC TAC CCC CAG 5688 
Ser Ser Val Thr Thr Leu Arg Cys Arg Ala Leu Asn Tyr Tyr Pro Gin 
220 225 230 

AAC ATC ACC ATG AAG TGG CTG AAG GAT AAG CAG CCA ATG GAT GCC AAG 5736 
Asn lie Thr Met Lys Trp Leu Lys Asp Lys Gin Pro Met Asp Ala Lys 
235 240 245 

GAG TTC GAA CCT AAA GAC GTA TTG CCC AAT GGG GAT GGG ACC TAC CAG 5784 
Glu Phe Glu Pro Lys Asp Val Leu Pro Asn Gly Asp Gly Thr Tyr Gin 
250 255 260 265 

GGC TGG ATA ACC TTG GCT GTA CCC CCT GGG GAA GAG CAG AGA TAT ACG 5832 
Gly Trp lie Thr Leu Ala Val Pro Pro Gly Glu Glu Gin Arg Tyr Thr 
270 275 280 

TAC CAG GTG GAG CAC CCA GGC CTG GAT CAG CCC CTC ATT GTG ATC TGG G 5881 
Tyr Gin Val Glu His Pro Gly Leu Asp Gin Pro Leu lie Val lie Trp 
285 290 295 

GTATGTGACT GATGAGAGCC AGGAGCTGAG AAAATCTATT GGGGGTTGAG AGGAGTGCCT 5941 

GAGGAGGTAA TTATGGCAGT GAGATGAGGA TCTGCTCTTT GTTAGGGGGT GGGCTGAGGG 6001 

TGGCAATCAA AGGCTTTAAC TTGCTTTTTC TGTTTTAG AG CCC TCA CCG TCT 6053 

Glu Pro Ser Pro Ser 
300 

GGC ACC CTA GTC ATT GGA GTC ATC AGT GGA ATT GCT GTT TTT GTC GTC 6101 
Gly Thr Leu Val He Gly Val He Ser Gly He Ala Val Phe Val Val 
305 310 315 

ATC TTG TTC ATT GGA ATT TTG TTC ATA ATA TTA AGG AAG AGG CAG GGT 6149 
He Leu Phe He Gly He Leu Phe He He Leu Arg Lys Arg Gin Gly 
320 325 330 

TCA A GTGAGTAGGA ACAAGGGGGA AGTCTCTTAG TACCTCTGCC CCAGGGCACA 6203 

Ser 

335 

GTGGGAAGAG GGGCAGAGGG GATCTGGCAT CCATGGGAAG CATTTTTCTC ATTTATATTC 6263 
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TTTGGGGACA CCAGCAGCTC CCTGGGAGAC AGAAAATAAT GGTTCTCCCC AGAATGAAAG 6323 

TCTCTAATTC AACAAACATC TTCAGAGCAC CTACTATTTT GCAAGAGCTG TTTAAGGTAG 6383 

TACAGGGGCT TTGAGGTTGA GAAGTCACTG TGGCTATTCT CAGAACCCAA ATCTGGTAGG 6443 

GAATGAAATT GATAGCAAGT AAATGTAGTT AAAGAAGACC CCATGAGGTC CTAAAGCAGG 6503 

CAGGAAGCAA ATGCTTAGGG TGTCAAAGGA AAGAATGATC ACATTCAGCT GGGGATCAAG 6563 

ATAGCCTTCT GGATCTTGAA GGAGAAGCTG GATTCCATTA GGTGAGGTTG AAGATGATGG 6623 

GAGGTCTACA CAGACGGAGC AACCATGCCA AGTAGGAGAG TATAAGGCAT ACTGGGAGAT 6683 

TAGAAATAAT TACTGTACCT TAACCCTGAG TTTGCGTAGC TATCACTCAC CAATTATGCA 6743 

TTTCTACCCC CTGAACATCT GTGGTGTAGG GAAAAGAGAA TCAGAAAGAA GCCAGCTCAT 6803 

ACAGAGTCCA AGGGTCTTTT GGGATATTGG GTTATGATCA CTGGGGTGTC ATTGAAGGAT 6863 

CCTAAGAAAG GAGGACCACG ATCTCCCTTA TATGGTGAAT GTGTTGTTAA GAAGTTAGAT 6923 

GAGAGGTGAG GAGACCAGTT AGAAAGCCAA TAAGCATTTC CAGATGAGAG ATAATGGTTC 6983 

TTGAAATCCA ATAGTGCCCA GGTCTAAATT GAGATGGGTG AATGAGGAAA ATAAGGAAGA 7043 

GAGAAGAGGC AAGATGGTGC CTAGGTTTGT GATGCCTCTT TCCTGGGTCT CTTGTCTCCA 7103 

CAG GA GGA GCC ATG GGG CAC TAG GTC TTA GCT GAA CGT GAG 7144 
Arg Gly Ala Met Gly His Tyr Val Leu Ala Glu Arg Glu 
340 345 

TGACACGCAG CCTGCAGACT CACTGTGGGA AGGAGACAAA ACTAGAGACT CAAAGAGGGA 7204 

GTGCATTTAT GAGCTCTTCA TGTTTCAGGA GAGAGTTGAA CCTAAACATA GAAATTGCCT 7264 

GACGAACTCC TTGATTTTAG CCTTCTCTGT TCATTTCCTC AAAAAGATTT CCCCATTTAG 7324 

GTTTCTGAGT TCCTGCATGC CGGTGATCCC TAGCTGTGAC CTCTCCCCTG GAACTGTCTC 7384 

TCATGAACCT CAAGCTGCAT CTAGAGGCTT CCTTCATTTC CTCCGTCACC TCAGAGACAT 7444 

ACACCTATGT CATTTCATTT CCTATTTTTG GAAGAGGACT CCTTAAATTT GGGGGACTTA 7504 

CATGATTCAT TTTAACATCT GAGAAAAGCT TTGAACCCTG GGACGTGGCT AGTCATAACC 7564 

TTACCAGATT TTTACACATG TATCTATGCA TTTTCTGGAC CCGTTCAACT TTTCCTTTGA 7624 

ATCCTCTCTC TGTGTTACCC AGTAACTCAT CTGTCACCAA GCCTTGGGGA TTCTTCCATC 7684 

TGATTGTGAT GTGAGTTGCA CAGCTATGAA GGCTGTACAC TGCACGAATG GAAGAGGCAC 7744 

CTGTCCCAGA AAAAGCATCA TGGCTATCTG TGGGTAGTAT GATGGGTGTT TTTAGCAGGT 7804 

AGGAGGCAAA TATCTTGAAA GGGGTTGTGA AGAGGTGTTT TTTCTAATTG GCATGAAGGT 7864 

GTCATACAGA TTTGCAAAGT TTAATGGTGC CTTCATTTGG GATGCTACTC TAGTATTCCA 7924 

GACCTGAAGA ATCACAATAA TTTTCTACCT GGTCTCTCCT TGTTCTGATA ATGAAAATTA 7984 

TGATAAGGAT GATAAAAGCA CTTACTTCGT GTCCGACTCT TCTGAGCACC TACTTACATG 8044 
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CATTACTGCA 


TGCACTTCTT 


ACAATAATTC 


TATGAGATAG 


GTACTATTAT 


CCCCATTTCT 


8104 


TTTTTAAATG 


AAGAAAGTGA 


AGTAGGCCGG 


GCACGGTGGC 


TCACGCCTGT 


AATCCCAGCA 


8164 


CTTTGGGAGG 


CCAAAGCGGG 


TGGATCACGA 


GGTCAGGAGA 


TCGAGACCAT 


CCTGGCTAAC 


8224 


ATGGTGAAAC 


CCCATCTCTA 


ATAAAAATAC 


AAAAAATTAG 


CTGGGCGTGG 


TGGCAGACGC 


8284 


CTGTAGTCCC 


AGCTACTCGG 


AAGGCTGAGG 


CAGGAGAATG 


GCATGAACCC 


AGGAGGCAGA 


8344 


GCTTGCAGTG 


AGCCGAGTTT 


GCGCCACTGC 


ACTCCAGCCT 


AGGTGACAGA 


GTGAGACTCC 


8404 


ATCTCAAAAA 


AATAAAAATA 


AAAATAAAAA 


AATGAAAAAA 


AAAAGAAAGT 


GAAGTATAGA 


8464 


GTATCTCATA 


GTTTGTCAGT 


GATAGAAACA 


GGTTTCAAAC 


TCAGTCAATC 


TGACCGTTTG 


8524 


ATACATCTCA 


GACACCACTA 


CATTCAGTAG 


TTTAGATGCC 


TAGAATAAAT 


AGAGAAGGAA 


8584 


GGAGATGGCT 


CTTCTCTTGT 


CTCATTGTGT 


TTCTTCTGAG 


TGAGCTTGAA 


TCACATGAAG 


8644 


GGGAACAGCA 


GAAAACAACC 


AACTGATCCT 


CAGCTGTCAT 


GTTTCCTTTA 


AAAGTCCCTG 


8704 


AAGGAAGGTC 


CTGGAATGTG 


ACTCCCTTGC 


TCCTCTGTTG 


CTCTCTTTGG 


CATTCATTTC 


8764 


TTTGGACCCT 


ACGCAAGGAC 


TGTAATTGGT 


GGGGACAGCT 


AGTGGCCCTG 


CTGGGCTTCA 


8824 


CACACGGTGT 


CCTCCCTAGG 


CCAGTGCCTC 


TGGAGTCAGA 


ACTCTGGTGG 


TATTTCCCTC 


8884 


AATGAAGTGG 


AGTAAGCTCT 


CTCATTTTGA 


GATGGTATAA 


TGGAAGCCAC 


CAAGTGGCTT 


8944 


AGAGGATGCC 


CAGGTCCTTC 


CATGGAGCCA 


CTGGGGTTCC 


GGTGCACATT 


AAAAAAAAAA 


9004 


TCTAACCAGG 


ACATTCAGGA 


ATTGCTAGAT 


TCTGGGAAAT 


CAGTTCACCA 


TGTTCAAAAG 


9064 


AGTCTTTTTT 


TTTTTTTTGA 


GACTCTATTG 


CCCAGGCTGG 


AGTGCAATGG 


CATGATCTCG 


9124 


GCTCACTGTA 


ACCTCTGCCT 


CCCAGGTTCA 


AGCGATTCTC 


CTGTCTCAGC 


CTCCCAAGTA 


9184 


GCTGGGATTA 


CAGGCGTGCA 


CCACCATGCC 


CGGCTAATTT 


TTGTATTTTT 


AGTAGAGACA 


9244 


GGGTTTCACC 


ATGTTGGCCA 


GGCTGGTCTC 


GAACTCTCCT 


GACCTCGTGA 


TCCGCCTGCC 


9304 


TCGGCCTCCC 


AAAGTGCTGA 


GATTACAGGT 


GTGAGCCACC 


CTGCCCAGCC 


GTCAAAAGAG 


9364 


TCTTAATATA 


TATATCCAGA 


TGGCATGTGT 


TTACTTTATG 


TTACTACATG 


CACTTGGCTG 


9424 


CATAAATGTG 


GTACAAGCAT 


TCTGTCTTGA 


AGGGCAGGTG 


CTTCAGGATA 


CCATATACAG 


9484 


CTCAGAAGTT 


TCTTCTTTAG 


GCATTAAATT 


TTAGCAAAGA 


TATCTCATCT 


CTTCTTTTAA 


9544 


ACCATTTTCT 


TTTTTTGTGG 


TTAGAAAAGT 


TATGTAGAAA 


AAAGTAAATG 


TGATTTACGC 


9604 


TCATTGTAGA 


AAAGCTATAA 


AATGAATACA 


ATTAAAGCTG 


TTATTTAATT 


AGCCAGTGAA 


9664 


AAACTATTAA 


CAACTTGTCT 


ATTACCTGTT 


AGTATTATTG 


TTGCATTAAA 


AATGCATATA 


9724 


CTTTAATAAA 


TGTATATTGT 


ATTGTATACT 


GCATGATTTT 


ATTGAAGTTC 


TTGTTCATCT 


9784 


TGTGTATATA 


CTTAATCGCT 


TTGTCATTTT 


GGAGACATTT 


ATTTTGCTTC 


TAATTTCTTT 


9844 


ACATTTTGTC 


TTACGGAATA 


TTTTCATTCA 


ACTGTGGTAG 


CCGAATTAAT 


CGTGTTTCTT 


9904 
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CACTCTAGGG 


ACATTGTCGT 


CTAAGTTGTA 


AGACATTGGT 


TATTTTACCA 


GCAAACCATT 


9964 


CTGAAAGCAT 


ATGACAAATT 


ATTTCTCTCT 


TAATATCTTA 


CTATACTGAA 


AGCAGACTGC 


10024 


TATAAGGCTT 


CACTTACTCT 


TCTACCTCAT 


AAGGAATATG 


TTACAATTAA 


TTTATTAGGT 


10084 


AAGCATTTGT 


TTTATATTGG 


TTTTATTTCA 


CCTGGGCTGA 


GATTTCAAGA 


AACACCCCAG 


10144 


TCTTCACAGT 


AACACATTTC 


ACTAACACAT 


TTACTAAACA 


TCAGCAACTG 


TGGCCTGTTA 


10204 


ATTTTTTTAA 


TAGAAATTTT 


AAGTCCTCAT 


TTTCTTTCGG 


TGTTTTTTAA 


GCTTAATTTT 


10264 


TCTGGCTTTA 


TTCATAAATT 


CTTAAGGTCA 


ACTACATTTG 


AAAAATCAAA 


GACCTGCATT 


10324 


TTAAATTCTT 


ATTCACCTCT 


GGCAAAACCA 


TTCACAAACC 


ATGGTAGTAA 


AGAGAAGGGT 


10384 


GACACCTGGT 


GGCCATAGGT 


AAATGTACCA 


CGGTGGTCCG 


GTGACCAGAG 


ATGCAGCGCT 


10444 


GAGGGTTTTC 


CTGAAGGTAA 


AGGAATAAAG 


AATGGGTGGA 


GGGGCGTGCA 


CTGGAAATCA 


10504 


CTTGTAGAGA 


AAAGCCCCTG 


AAAATTTGAG 


AAAACAAACA 


AGAAACTACT 


TACCAGCTAT 


10564 


TTGAATTGCT 


GGAATCACAG 


GCCATTGCTG 


AGCTGCCTGA 


ACTGGGAACA 


CAACAGAAGG 


10624 


AAAACAAACC 


ACTCTGATAA 


TCATTGAGTC 


AAGTACAGCA 


GGTGATTGAG 


GACTGCTGAG 


10684 


AGGTACAGGC 


CAAAATTCTT 


ATGTTGTATT 


ATAATAATGT 


CATCTTATAA 


TACTGTCAGT 


10744 


ATTTTATAAA 


ACATTCTTCA 


CAAACTCACA 


CACATTTAAA 


AACAAAACAC 


TGTCTCTAAA 


10804 


ATCCCCAAAT 


TTTTCATAAA 


C 








10825 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 348 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Gly Pro Arg Ala Arg Pro Ala Leu Leu Leu Leu Met Leu Leu Gin 
15 10 15 

Thr Ala Val Leu Gin Gly Arg Leu Leu Arg Ser His Ser Leu His Tyr 
20 25 30 

Leu Phe Met Gly Ala Ser Glu Gin Asp Leu Gly Leu Ser Leu Phe Glu 
35 40 45 

Ala Leu Gly Tyr Val Asp Asp Gin Leu Phe Val Phe Tyr Asp Asp Glu 
50 55 60 

Ser Arg Arg Val Glu Pro Arg Thr Pro Trp Val Ser Ser Arg lie Ser 
65 70 75 80 

Ser Gin Met Trp Leu Gin Leu Ser Gin Ser Leu Lys Gly Trp Asp His 
85 90 95 



Met Phe Thr Val Asp Phe Trp Thr lie 
100 105 

Lys Glu Ser His Thr Leu Gin Val lie 
115 120 

Asp Asn Ser Thr Glu Gly Tyr Trp Lys 
130 135 

His Leu Glu Phe Cys Pro Asp Thr Leu 
145 ISO 

Arg Ala Trp Pro Thr Lys Leu Glu Trp 
165 

Arg Gin Asn Arg Ala Tyr Leu Glu Arg 
180 185 

Gin Leu Leu Glu Leu Gly Arg Gly Val 
195 200 
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Met Glu Asn His 



Leu Gly 

Tyr Gly 

Asp Trp 
155 

Glu Arg 
170 

Asp Cys 
Leu Asp 



Leu Val Lys Val Thr His His 
210 215 

Cys Arg Ala Leu Asn Tyr Tyr 
225 230- 



Val Thr Ser Ser 
Pro Gin 



Lys Asp Lys Gin Pro 
245 

Leu Pro Asn Gly Asp 
260 



Tyr Gin 
265 



Met Asp Ala Lys 
Gly Thr 

Pro Pro Gly Glu Glu Gin Arg Tyr Thr 
275 280 

Leu Asp Gin Pro Leu lie Val lie Trp 
290 295 

Leu Val He Gly Val He Ser Gly He 
305 310 

Phe He Gly He Leu Phe He He Leu 
325 

Gly Ala Met Gly His Tyr Val Leu Ala 
340 345 



Asn He 
235 

Glu Phe 
250 

Gly Trp 
Tyr Gin 
Glu Pro 



Ala Val 
315 



Cys Glu 
125 

Tyr Asp 
140 

Arg Ala 

His Lys 

Pro Ala 

Gin Gin 
205 

Val Thr 
220 

Thr Met 
Glu Pro 
He Thr 



Val Glu 
285 

Ser Pro 
300 

Phe Val 



Arg Lys 
330 



Arg Gin 
Glu Arg Glu 



Asn His Ser 
110 

Met Gin Glu 



Gly Gin Asp 



Ala Glu Pro 
160 

He Arg Ala 
175 

Gin Leu Gin 
190 

Val Pro Pro 



Thr Leu Arg 

Lys Trp Leu 
240 

Lys Asp Val 
255 

Leu Ala Val 
270 

His Pro Gly 
Ser Gly Thr 



Val He Leu 
320 

Gly Ser Arg 
335 



(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1440 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 



38 



(B) LOCATION: 222.. 1268 



(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (408, "c") 

(D) OTHER INFORMATION: /phenotype= "normal or wild- type 

(unaffected) " 
/label= 24d2 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (414, "a") 

(D) OTHER INFORMATION: /phenotype= "normal or wild-type 

(unaffected) " 
/label= 24d7 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace ( 1066 , "g") 

(D) OTHER INFORMATION: /phenotype= "normal or wild- type 

(unaffected) " 
/label= 24dl 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GGGGACACTG GATCACCTAG TGTTTCACAA GCAGGTACCT TCTGCTGTAG GAGAGAGAGA 60 

ACTAAAGTTC TGAAAGACCT GTTGCTTTTC ACCAGGAAGT TTTACTGGGC ATCTCCTGAG 120 

CCTAGGCAAT AGCTGTAGGG TGACTTCTGG AGCCATCCCC GTTTCCCCGC CCCCCAAAAG 180 

AAGCGGAGAT TTAACGGGGA CGTGCGGCCA GAGCTGGGGA A ATG GGC CCG CGA 233 

Met Gly Pro Arg 
1 

GCC AGG CCG GCG CTT CTC CTC CTG ATG CTT TTG CAG ACC GCG GTC CTG 281 
Ala Arg Pro Ala Leu Leu Leu Leu Met Leu Leu Gin Thr Ala Val Leu 
5 10 15 20 

CAG GGG CGC TTG CTG CGT TCA CAC TCT CTG CAC TAG CTC TTC ATG GGT 329 
Gin Gly Arg Leu Leu Arg Ser His Ser Leu His Tyr Leu Phe Met Gly 
25 30 35 

GCC TCA GAG CAG GAC CTT GGT CTT TCC TTG TTT GAA GCT TTG GGC TAC 377 
Ala Ser Glu Gin Asp Leu Gly Leu Ser Leu Phe Glu Ala Leu Gly Tyr 
40 45 " 50 

GTG GAT GAC CAG CTG TTC GTG TTC TAT GAT CAT GAG AGT CGC CGT GTG 425 
Val Asp Asp Gin Leu Phe Val Phe Tyr Asp His Glu Ser Arg Arg Val 
55 60 65 

GAG CCC CGA ACT CCA TGG GTT TCC AGT AGA ATT TCA AGC CAG ATG TGG 473 
Glu Pro Arg Thr Pro Trp Val Ser Ser Arg lie Ser Ser Gin Met Trp 
70 75 80 

CTG CAG CTG AGT CAG AGT CTG AAA GGG TGG GAT CAC ATG TTC ACT GTT 521 
Leu Gin Leu Ser Gin Ser Leu Lys Gly Trp Asp His Met Phe Thr Val 
85 90 95 100 

GAC TTC TGG ACT ATT ATG GAA AAT CAC AAC CAC AGC AAG GAG TCC CAC 569 
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Asp Phe Trp Thr lie Met Glu Asn His Asn His Ser Lys Glu Ser His 
105 110 115 

ACC CTG CAG GTC ATC CTG GGC TGT GAA ATG CAA GAA GAC AAC AGT ACC 617 
Thr Leu Gin Val lie Leu Gly Cys Glu Met Gin Glu Asp Asn Ser Thr 
120 125 130 

GAG GGC TAG TGG AAG TAG GGG TAT GAT GGG CAG GAC CAC CTT GAA TTC 665 
Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly Gin Asp His Leu Glu Phe 
135 140 145 

TGC CCT GAC ACA CTG GAT TGG AGA GCA GCA GAA CCC AGG GCC TGG CCC 713 
Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala Glu Pro Arg Ala Trp Pro 
150 155 160 

ACC AAG CTG GAG TGG GAA AGG CAC AAG ATT CGG GCC AGG CAG AAC AGG 761 
Thr Lys Leu Glu Trp Glu Arg His Lys lie Arg Ala Arg Gin Asn Arg 
165 170 175 180 

GCC TAC CTG GAG AGG GAC TGC CCT GCA CAG CTG CAG CAG TTG CTG GAG 809 
Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin Leu Gin Gin Leu Leu Glu 
185 190 195 

CTG GGG AGA GGT GTT TTG GAC CAA CAA GTG CCT CCT TTG GTG AAG GTG 857 
Leu Gly Arg Gly Val Leu Asp Gin Gin Val Pro Pro Leu Val Lys Val 
200 205 210 

ACA CAT CAT GTG ACC TCT TCA GTG ACC ACT CTA CGG TGT CGG GCC TTG 905 
Thr His His Val Thr Ser Ser Val Thr Thr Leu Arg Cys Arg Ala Leu 
215 220 225 

T^C TAC TAC CCC CAG AAC ATC ACC ATG AAG TGG CTG AAG GAT AAG CAG 953 
Asn Tyr Tyr Pro Gin Asn lie Thr Met Lys Trp Leu Lys Asp Lys Gin 
230 235 240 

CCA ATG GAT GCC AAG GAG TTC GAA CCT AAA GAC GTA TTG CCC AAT GGG 1001 
Pro Met Asp Ala Lys Glu Phe Glu Pro Lys Asp Val Leu Pro Asn Gly 
245 250 255 260 

GAT GGG ACC TAC CAG GGC TGG ATA ACC TTG GCT GTA CCC CCT GGG GAA 1049 
Asp Gly Thr Tyr Gin Gly Trp lie Thr Leu Ala Val Pro Pro Gly Glu 
265 270 275 

GAG CAG AGA TAT ACG TGC CAG GTG GAG CAC CCA GGC CTG GAT CAG CCC 1097 
Glu Gin Arg Tyr Thr Cys Gin Val Glu His Pro Gly Leu Asp Gin Pro 
280 285 290 

CTC ATT GTG ATC TGG GAG CCC TCA CCG TCT GGC ACC CTA GTC ATT GGA 1145 
Leu lie Val lie Trp Glu Pro Ser Pro Ser Gly Thr Leu Val lie Gly 
295 300 305 

GTC ATC AGT GGA ATT GCT GTT TTT GTC GTC ATC TTG TTC ATT GGA ATT 1193 
Val lie Ser Gly lie Ala Val Phe Val Val lie Leu Phe He Gly He 
310 315 320 

TTG TTC ATA ATA TTA AGG AAG AGG CAG GGT TCA AGA GGA GCC ATG GGG 1241 
Leu Phe He He Leu Arg Lys Arg Gin Gly Ser Arg Gly Ala Met Gly 
325 330 335 340 

CAC TAC GTC TTA GCT GAA CGT GAG TGACACGCAG CCTGCAGACT CACTGTGGGA 1295 
His Tyr Val Leu Ala Glu Arg Glu 
345 
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AGGAGACAAA ACTAGAGACT CAAAGAGGGA GTGCATTTAT GAGCTCTTCA TGTTTCAGGA 1355 
GAGAGTTGAA CCTAAACATA GAAATTGCCT GACGAACTCC TTGATTTTAG CCTTCTCTGT 1415 
TCATTTCCTC AAAAAGATTT CCCCA 1440 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1440 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 222.. 1268 



(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (1066 , "a") 

(D) OTHER INFORMATION: /phenotype= "Hereditary Hemochromatosis 

(HH) " 

/label= 24dl 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

GGGGACACTG GATCACCTAG TGTTTCACAA GCAGGTACCT TCTGCTGTAG GAGAGAGAGA 60 

ACTAAAGTTC TGAAAGACCT GTTGCTTTTC ACCAGGAAGT TTTACTGGGC ATCTCCTGAG 120 

CCTAGGCAAT AGCTGTAGGG TGACTTCTGG AGCCATCCCC GTTTCCCCGC CCCCCAAAAG 180 

AAGCGGAGAT TTAACGGGGA CGTGCGGCCA GAGCTGGGGA A ATG GGC CCG CGA 233 

Met Gly Pro Arg 
1 

GCC AGG CCG GCG CTT CTC CTC CTG ATG CTT TTG CAG ACC GCG GTC CTG 281 
Ala Arg Pro Ala Leu Leu Leu Leu Met Leu Leu Gin Thr Ala Val Leu 
5 10 15 20 

CAG GGG CGC TTG CTG CGT TCA CAC TCT CTG CAC TAC CTC TTC ATG GGT 329 
Gin Gly Arg Leu Leu Arg Ser His Ser Leu His Tyr Leu Phe Met Gly 
25 30 35 

GCC TCA GAG CAG GAC CTT GGT CTT TCC TTG TTT GAA GCT TTG GGC TAC 377 
Ala Ser Glu Gin Asp Leu Gly Leu Ser Leu Phe Glu Ala Leu Gly Tyr 
40 45 50 

GTG GAT GAC CAG CTG TTC GTG TTC TAT GAT CAT GAG AGT CGC CGT GTG 425 
Val Asp Asp Gin Leu Phe Val Phe Tyr Asp His Glu Ser Arg Arg Val 
55 60 65 



GAG CCC CGA ACT CCA TGG GTT TCC AGT AGA ATT TCA AGC CAG ATG TGG 
Glu Pro Arg Thr Pro Trp Val Ser Ser Arg lie Ser Ser Gin Met Trp 
70 75 80 



473 
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CTG CAG CTG AGT CAG AGT CTG AAA GGG TGG GAT CAC ATG TTC ACT GTT 521 
Leu Gin Leu Ser Gin Ser Leu Lys Gly Trp Asp His Met Phe Thr Val 
85 90 95 100 

GAC TTC TGG ACT ATT ATG GAA AAT CAC AAC CAC AGC AAG GAG TCC CAC 569 
Asp Phe Trp Thr lie Met Glu Asn His Asn His Ser Lys Glu Ser His 
105 110 115 

ACC CTG CAG GTC ATC CTG GGC TGT GAA ATG CAA GAA GAC AAC AGT ACC 617 
Thr Leu Gin Val lie Leu Gly Cys Glu Met Gin Glu Asp Asn Ser Thr 
120 125 130 

GAG GGC TAC TGG AAG TAC GGG TAT GAT GGG CAG GAC CAC CTT GAA TTC 665 
Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly Gin Asp His Leu Glu Phe 
135 140 145 

TGC CCT GAC ACA CTG GAT TGG AGA GCA GCA GAA CCC AGG GCC TGG CCC 713 
Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala Glu Pro Arg Ala Trp Pro 
150 155 160 

ACC AAG CTG GAG TGG GAA AGG CAC AAG ATT CGG GCC AGG CAG AAC AGG 761 
Thr Lys Leu Glu Trp Glu Arg His Lys lie Arg Ala Arg Gin Asn Arg 
165 170 175 180 

GCC TAC CTG GAG AGG GAC TGC CCT GCA CAG CTG CAG CAG TTG CTG GAG 809 
Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin Leu Gin Gin Leu Leu Glu 
185 190 195 

CTG GGG AGA GGT GTT TTG GAC CAA CAA GTG CCT CCT TTG GTG AAG GTG 857 
Leu Gly Arg Gly Val Leu Asp Gin Gin Val Pro Pro Leu Val Lys Val 
200 205 210 

ACA CAT CAT GTG ACC TCT TCA GTG ACC ACT CTA CGG TGT CGG GCC TTG 905 
Thr His His Val Thr Ser Ser Val Thr Thr Leu Arg Cys Arg Ala Leu 
215 220 225 

AAC TAC TAC CCC CAG AAC ATC ACC ATG AAG TGG CTG AAG GAT AAG CAG 953 
Asn Tyr Tyr Pro Gin Asn lie Thr Met Lys Trp Leu Lys Asp Lys Gin 
230 235 240 

CCA ATG GAT GCC AAG GAG TTC GAA CCT AAA GAC GTA TTG CCC AAT GGG 1001 
Pro Met Asp Ala Lys Glu Phe Glu Pro Lys Asp Val Leu Pro Asn Gly 
245 250 255 260 

GAT GGG ACC TAC CAG GGC TGG ATA ACC TTG GCT GTA CCC CCT GGG GAA 1049 
Asp Gly Thr Tyr Gin Gly Trp lie Thr Leu Ala Val Pro Pro Gly Glu 
265 270 ^ 275 

GAG CAG AGA TAT ACG TAC CAG GTG GAG CAC CCA GGC CTG GAT CAG CCC 1097 
Glu Gin Arg Tyr Thr Tyr Gin Val Glu His Pro Gly Leu Asp Gin Pro 
280 285 290 

CTC ATT GTG ATC TGG GAG CCC TCA CCG TCT GGC ACC CTA GTC ATT GGA 1145 
Leu lie Val lie Tirp Glu Pro Ser Pro Ser Gly Thr Leu Val lie Gly 
295 300 305 

GTC ATC AGT GGA ATT GCT GTT TTT GTC GTC ATC TTG TTC ATT GGA ATT 1193 
Val He Ser Gly He Ala Val Phe Val Val He Leu Phe He Gly He 
310 315 320 

TTG TTC ATA ATA TTA AGG AAG AGG CAG GGT TCA AGA GGA GCC ATG GGG 1241 
Leu Phe He He Leu Arg Lys Arg Gin Gly Ser Arg Gly Ala Met Gly 
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325 330 335 340 

CAC TAG GTC TTA GOT GAA CGT GAG TGACACGCAG CCTGCAGACT CACTGTGGGA 1295 
His Tyr Val Leu Ala Glu Arg Glu 
345 

AGGAGACAAA ACTAGAGACT CAAAGAGGGA GTGCATTTAT GAGCTCTTCA TGTTTCAGGA 1355 

GAGAGTTGAA CCTAAACATA GAAATTGCCT GACGAACTCC TTGATTTTAG CCTTCTCTGT 1415 

TCATTTCCTC AAAAAGATTT CCCCA 1440 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1440 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 222 1268 



(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (408, "g") 

(D) OTHER INFORMATION: /phenotype= "Hereditary Hemochromatosis 

(HH) " 

/labels 24d2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GGGGACACTG GATCACCTAG TGTTTCACAA GCAGGTACCT TCTGCTGTAG GAGAGAGAGA 60 

ACTAAAGTTC TGAAAGACCT GTTGCTTTTC ACCAGGAAGT TTTACTGGGC ATCTCCTGAG 120 

CCTAGGCAAT AGCTGTAGGG TGACTTCTGG AGCCATCCCC GTTTCCCCGC CCCCCAAAAG 180 

AAGCGGAGAT TTAACGGGGA CGTGCGGCCA GAGCTGGGGA A ATG GGC CCG CGA 233 

Met Gly Pro Arg 
1 

GCC AGG CCG GCG CTT CTC CTC CTG ATG CTT TTG CAG ACC GCG GTC CTG 281 
Ala Arg Pro Ala Leu Leu Leu Leu Met Leu Leu Gin Thr Ala Val Leu 
5 10 15 20 

CAG GGG CGC TTG CTG CGT TCA CAC TCT CTG CAC TAC CTC TTC ATG GGT 329 
Gin Gly Arg Leu Leu Arg Ser His Ser Leu His Tyr Leu Phe Met Gly 
25 30 35 

GCC TCA GAG CAG GAC CTT GGT CTT TCC TTG TTT GAA GCT TTG GGC TAC 377 
Ala Ser Glu Gin Asp Leu Gly Leu Ser Leu Phe Glu Ala Leu Gly Tyr 
40 45 50 



GTG GAT GAC CAG CTG TTC GTG TTC TAT GAT GAT GAG AGT CGC CGT GTG 



425 
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Val Asp Asp Gin Leu Phe Val Phe Tyr Asp Asp Glu Ser Arg Arg Val 
55 60 65 

GAG CCC CGA ACT CCA TGG GTT TCC AGT AGA ATT TCA AGC CAG ATG TGG 473 
Glu Pro Arg Thr Pro Trp Val Ser Ser Arg lie Ser Ser Gin Met Trp 
70 75 80 

CTG CAG CTG AGT CAG AGT CTG AAA GGG TGG GAT CAC ATG TTC ACT GTT 521 
Leu Gin Leu Ser Gin Ser Leu Lys Gly Trp Asp His Met Phe Thr Val 
85 90 95 100 

GAC TTC TGG ACT ATT ATG GAA AAT CAC AAC CAC AGC AAG GAG TCC CAC 569 
Asp Phe Trp Thr lie Met Glu Asn His Asn His Ser Lys Glu Ser His 
105 110 115 

ACC CTG CAG GTC ATC CTG GGC TGT GAA ATG CAA GAA GAC AAC AGT ACC 617 
Thr Leu Gin Val lie Leu Gly Cys Glu Met Gin Glu Asp Asn Ser Thr 
120 125 130 

GAG GGC TAC TGG AAG TAC GGG TAT GAT GGG CAG GAC CAC CTT GAA TTC 665 
Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly Gin Asp His Leu Glu Phe 
135 140 145 

TGC CCT GAC ACA CTG GAT TGG AGA GCA GCA GAA CCC AGG GCC TGG CCC 713 
Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala Glu Pro Arg Ala Trp Pro 
150 155 160 

ACC AAG CTG GAG TGG GAA AGG CAC AAG ATT CGG GCC AGG CAG AAC AGG 761 
Thr Lys Leu Glu Trp Glu Arg His Lys lie Arg Ala Arg Gin Asn Arg 
165 170 175 180 

GCC TAC CTG GAG AGG GAC TGC CCT GCA CAG CTG CAG CAG TTG CTG GAG 809 
Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin Leu Gin Gin Leu Leu Glu 
185 190 195 

CTG GGG AGA GGT GTT TTG GAC CAA CAA GTG CCT CCT TTG GTG AAG GTG 857 
Leu Gly Arg Gly Val Leu Asp Gin Gin Val Pro Pro Leu Val Lys Val 
200 205 210 

ACA CAT CAT GTG ACC TCT TCA GTG ACC ACT CTA CGG TGT CGG GCC TTG 905 
Thr His His Val Thr Ser Ser Val Thr Thr Leu Arg Cys Arg Ala Leu 
215 220 225 

AAC TAC TAC CCC CAG AAC ATC ACC ATG AAG TGG CTG AAG GAT AAG CAG 953 
Asn Tyr Tyr Pro Gin Asn lie Thr Met Lys Trp Leu Lys Asp Lys Gin 
230 235 240 

CCA ATG GAT GCC AAG GAG TTC GAA CCT AAA GAC GTA TTG CCC AAT GGG 1001 
Pro Met Asp Ala Lys Glu Phe Glu Pro Lys Asp Val Leu Pro Asn Gly 
245 250 255 260 

GAT GGG ACC TAC CAG GGC TGG ATA ACC TTG GCT GTA CCC CCT GGG GAA 1049 
Asp Gly Thr Tyr Gin Gly Trp lie Thr Leu Ala Val Pro Pro Gly Glu 
265 270 275 

GAG CAG AGA TAT ACG TGC CAG GTG GAG CAC CCA GGC CTG GAT CAG CCC 1097 
Glu Gin Arg Tyr Thr Cys Gin Val Glu His Pro Gly Leu Asp Gin Pro 
280 285 290 

CTC ATT GTG ATC TGG GAG CCC TCA CCG TCT GGC ACC CTA GTC ATT GGA 1145 
Leu lie Val lie Trp Glu Pro Ser Pro Ser Gly Thr Leu Val lie Gly 
295 300 305 
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GTC ATC AGT GGA ATT GCT GTT TTT GTC GTC ATC TTG TTC ATT GGA ATT 1193 
Val He Ser Gly He Ala Val Phe Val Val He Leu Phe He Gly He 
310 315 320 

TTG TTC ATA ATA TTA AGG AAG AGG CAG GGT TCA AGA GGA GCC ATG GGG 1241 
Leu Phe He He Leu Arg Lys Arg Gin Gly Ser Arg Gly Ala Met Gly 
325 330 335 340 

CAC TAG GTC TTA GCT GAA CGT GAG TGACACGCAG CCTGCAGACT CACTGTGGGA 1295 
His Tyr Val Leu Ala Glu Arg Glu 
345 

AGGAGACAAA ACTAGAGACT CAAAGAGGGA GTGCATTTAT GAGCTCTTCA TGTTTCAGGA 13 55 

GAGAGTTGAA CCTAAACATA GAAATTGCCT GACGAACTCC TTGATTTTAG CCTTCTCTGT 1415 

TCATTTCCTC AAAAAGATTT CCCCA ^^40 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1440 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 222.. 1268 



(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (408, "g") 

(D) OTHER INFORMATION: /phenotype= "Hereditary Hemochromatosis 

(HH) " 

/label= 24d2 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (1066 , "a") 

(D) OTHER INFORMATION: /phenotype= "Hereditary Hemochromatosis 

(HH) " 

/label= 24dl 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GGGGACACTG GATCACCTAG TGTTTCACAA GCAGGTACCT TCTGCTGTAG GAGAGAGAGA 60 

ACTAAAGTTC TGAAAGACCT GTTGCTTTTC ACCAGGAAGT TTTACTGGGC ATCTCCTGAG 120 

CCTAGGCAAT AGCTGTAGGG TGACTTCTGG AGCCATCCCC GTTTCCCCGC CCCCCAAAAG 180 

AAGCGGAGAT TTAACGGGGA CGTGCGGCCA GAGCTGGGGA A ATG GGC CCG CGA 233 

Met Gly Pro Arg 
1 
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GCC AGG CCG GCG CTT CTC CTC CTG ATG CTT TTG CAG ACC GCG GTC CTG 281 
Ala Arg Pro Ala Leu Leu Leu Leu Met Leu Leu Gin Thr Ala Val Leu 
5 10 15 20 

CAG GGG CGC TTG CTG CGT TCA CAC TCT CTG CAC TAG CTC TTC ATG GGT 329 
Gin Gly Arg Leu Leu Arg Ser His Ser Leu His Tyr Leu Phe Met Gly 
25 30 35 

GCC TCA GAG CAG GAC CTT GGT CTT TCC TTG TTT GAA GCT TTG GGC TAC 3 77 

Ala Ser Glu Gin Asp Leu Gly Leu Ser Leu Phe Glu Ala Leu Gly Tyr 
40 45 50 

GTG GAT GAC CAG CTG TTC GTG TTC TAT GAT GAT GAG AGT CGC CGT GTG 425 
Val Asp Asp Gin Leu Phe Val Phe Tyr Asp Asp Glu Ser Arg Arg Val 
55 60 65 

GAG CCC CGA ACT CCA TGG GTT TCC AGT AGA ATT TCA AGC CAG ATG TGG 473 
Glu Pro Arg Thr Pro Trp Val Ser Ser Arg He Ser Ser Gin Met Trp 
70 75 80 

CTG CAG CTG AGT CAG AGT CTG AAA GGG TGG GAT CAC ATG TTC ACT GTT 521 
Leu Gin Leu Ser Gin Ser Leu Lys Gly Trp Asp His Met Phe Thr Val 
85 90 95 100 

GAC TTC TGG ACT ATT ATG GAA AAT CAC AAC CAC AGC AAG GAG TCC CAC 569 
Asp Phe Trp Thr He Met Glu Asn His Asn His Ser Lys Glu Ser His 
105 110 115 

ACC CTG CAG GTC ATC CTG GGC TGT GAA ATG CAA GAA GAC AAC AGT ACC 617 
Thr Leu Gin Val He Leu Gly Cys Glu Met Gin Glu Asp Asn Ser Thr 
120 125 130 

GAG GGC TAC TGG AAG TAC GGG TAT GAT GGG CAG GAC CAC CTT GAA TTC 665 
Glu Gly Tyr Trp Lys Tyr Gly Tyr Asp Gly Gin Asp His Leu Glu Phe 
135 140 145 

TGC CCT GAC ACA CTG GAT TGG AGA GCA GCA GAA CCC AGG GCC TGG CCC 713 
Cys Pro Asp Thr Leu Asp Trp Arg Ala Ala Glu Pro Arg Ala Trp Pro 
150 155 160 

ACC AAG CTG GAG TGG GAA AGG CAC AAG ATT CGG GCC AGG CAG AAC AGG 761 
Thr Lys Leu Glu Trp Glu Arg His Lys He Arg Ala Arg Gin Asn Arg 
165 170 175 180 

GCC TAC CTG GAG AGG GAC TGC CCT GCA CAG CTG CAG CAG TTG CTG GAG 809 
Ala Tyr Leu Glu Arg Asp Cys Pro Ala Gin Leu Gin Gin Leu Leu Glu 
185 190 ^ 195 

CTG GGG AGA GGT GTT TTG GAC CAA CAA GTG CCT CCT TTG GTG AAG GTG 857 
Leu Gly Arg Gly Val Leu Asp Gin Gin Val Pro Pro Leu Val Lys Val 
200 205 210 

ACA CAT CAT GTG ACC TCT TCA GTG ACC ACT CTA CGG TGT CGG GCC TTG 905 
Thr His His Val Thr Ser Ser Val Thr Thr Leu Arg Cys Arg Ala Leu 
215 220 225 

AAC TAC TAC CCC CAG AAC ATC ACC ATG AAG TGG CTG AAG GAT AAG CAG 953 
Asn Tyr Tyr Pro Gin Asn He Thr Met Lys Trp Leu Lys Asp Lys Gin 
230 235 240 

CCA ATG GAT GCC AAG GAG TTC GAA CCT AAA GAC GTA TTG CCC AAT GGG 1001 
Pro Met Asp Ala Lys Glu Phe Glu Pro Lys Asp Val Leu Pro Asn Gly 
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245 250 255 260 

GAT GGG ACC TAG CAG GGC TGG ATA ACC TTG GCT GTA CCC CCT GGG GAA 1049 
Asp Gly Thr Tyr Gin Gly Trp lie Thr Leu Ala Val Pro Pro Gly Glu 
265 270 275 

GAG CAG AGA TAT ACQ TAG CAG GTG GAG GAG CCA GGC GTG GAT CAG CCC 1097 
Glu Gin Arg Tyr Thr Tyr Gin Val Glu His Pro Gly Leu Asp Gin Pro 
280 285 290 

GTG ATT GTG ATG TGG GAG GGG TGA GGG TGT GGC ACC GTA GTG ATT GGA 1145 
Leu lie Val lie Trp Glu Pro Ser Pro Ser Gly Thr Leu Val lie Gly 
295 300 305 

GTG ATG AGT GGA ATT GCT GTT TTT GTG GTC ATG TTG TTG ATT GGA ATT 1193 
Val lie Ser Gly He Ala Val Phe Val Val He Leu Phe He Gly He 
310 315 320 

TTG TTG ATA ATA TTA AGG AAG AGG CAG GGT TGA AGA GGA GGC ATG GGG 1241 
Leu Phe He He Leu Arg Lys Arg Gin Gly Ser Arg Gly Ala Met Gly 
325 330 335 340 

GAG TAG GTG TTA GGT GAA GGT GAG TGAGACGCAG GGTGGAGACT GAGTGTGGGA 1295 
His Tyr Val Leu Ala Glu Arg Glu 
345 

AGGAGACAAA ACTAGAGACT CAAAGAGGGA GTGGATTTAT GAGCTCTTCA TGTTTGAGGA 1355 

GAGAGTTGAA CCTAAAGATA GAAATTGCCT GAGGAACTCC TTGATTTTAG CGTTCTGTGT 1415 

TGATTTCCTC AAAAAGATTT CCCCA 1440 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
TGGCAAGGGT AAACAGATGC 20 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
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CTCAGGCACT CCTCTCAACC 



20 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS ; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base= OTHER 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME /KEY: modif ied__base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base=s OTHER 



/note= "N = 5' -biotinylated guanine 
(bio-G) " 



(xi) 



SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



NGAAGAGCAG AGATATACGT G 



21 



/note= 
(bio-G) 



"N = 5 ' -biotinylated guanine 



II 



(xi) 



SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



NGAAGAGCAG AGATATACGT A 



21 



(2) INFORMATION FOR SEQ ID NO: 17: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA 
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(ix) FEATURE: 



(A) NAME/KEY: modified 

(B) LOCATION: 1 

(D) OTHER INFORMATION: 



base 



/mod_base^= OTHER 

/note= "N = 5' -phosphorylated cytosine 
(p-C) " 



(ix) FEATURE: 



(A) NAME/KEY: modified 

(B) LOCATION: 18 

(D) OTHER INFORMATION: 



base 



/mod_base= OTHER 

/note= "N = 3 ' -digoxigenin- conjugated 
guanine (G-dig) " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 



NCAGGTGGAG CACCCAGN 



18 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQtJENCE DESCRIPTION: SEQ ID N0:18: 
CTGAAAGGGT GGGATCACAT 20 



(2) INFORMATION FOR SEQ ID NO; 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CAAGGAGTTC GTCAGGCAAT 20 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 517 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..517 

(D) OTHER INFORMATION: /note= "normal or wild-type (unaffected) 

genomic sequence surrounding variant for 
24dl(G) allele corresponding to positions 
5507-6023 of genomic sequence containing 
the HH gene (SEQ ID N0:1)" 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (328, "g") 

(D) OTHER INFORMATION: /phenotype= "normal or wild- type 

(unaffected) " 
/label= 24dl 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



TATTTCCTTC 


CTCCAACCTA 


TAGAAGGAAG 


TGAAAGTTCC 


AGTCTTCCTG 


GCAAGGGTAA 


€0 


ACAGATCCCC 


TCTCCTCATC 


CTTCCTCTTT 


CCTGTCAAGT 


GCCTCCTTTG 


GTGAAGGTGA 


120 


CACATCATGT 


GACCTCTTCA 


GTGACCACTC 


TACGGTGTCG 


GGCCTTGAAC 


TACTACCCCC 


180 


AGAACATCAC 


CATGAAGTGG 


CTGAAGGATA 


AGCAGCCAAT 


GGATGCCAAG 


GAGTTCGAAC 


240 


CTAAAGACGT 


ATTGCCCAAT 


GGGGATGGGA 


CCTACCAGGG 


CTGGATAACC 


TTGGCTGTAC 


300 


CCCCTGGGGA 


AGAGCAGAGA 


TATACGTGCC 


AGGTGGAGCA 


CCCAGGCCTG 


GATCAGCCCC 


360 


TCATTGTGAT 


CTGGGGTATG 


TGACTGATGA 


GAGCCAGGAG 


CTGAGAAAAT 


CTATTGGGGG 


420 


TTGAGAGGAG 


TGCCTGAGGA 


GGTAATTATG 


GCAGTGAGAT 


GAGGATCTGC 


TCTTTGTTAG 


480 


GGGGTGGGCT 


GAGGGTGGCA 


ATCAAAGGCT 


TTAACTT 






517 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 517 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..517 

(D) OTHER INFORMATION: /note= "genomic sequence surrounding 

variant for 24dl (A) allele corresponding 
to positions 5507-6023 of genomic 
sequence containing the HH gene 
(SEQ ID N0:3) " 

(ix) FEATURE: 

(A) NAME/KEY: allele 

(B) LOCATION: replace (328, "a") 

(D) OTHER INFORMATION: /phenotype= "Hereditary Hemochromatosis 
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(HH) " 

/label= 24dl 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
TATTTCCTTC CTCCAACCTA TAGAAGGAAG TGAAAGTTCC AGTCTTCCTG GCAAGGGTAA 
ACAGATCCCC TCTCCTCATC CTTCCTCTTT CCTGTCAAGT GCCTCCTTTG GTGAAGGTGA 
CACATCATGT GACCTCTTCA GTGACCACTC TACGGTGTCG GGCCTTGAAC TACTACCCCC 
AGAACATCAC CATGAAGTGG CTGAAGGATA AGCAGCCAAT GGATGCCAAG GAGTTCGAAC 
CTAAAGACGT ATTGCCCAAT GGGGATGGGA CCTACCAGGG CTGGATAACC TTGGCTGTAC 
CCCCTGGGGA AGAGCAGAGA TATACGTACC AGGTGGAGCA CCCAGGCCTG GATCAGCCCC 
TCATTGTGAT CTGGGGTATG TGACTGATGA GAGCCAGGAG CTGAGAAAAT CTATTGGGGG 
TTGAGAGGAG TGCCTGAGGA GGTAATTATG GCAGTGAGAT GAGGATCTGC TCTTTGTTAG 
GGGGTGGGCT GAGGGTGGCA ATCAAAGGCT TTAACTT 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 361 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(ix) FEATURE: 

(A) NAME/KEY: Protein 

(B) LOCATION: 1,.361 
(D) OTHER INFORMATION: /note=: "Rabbit leukocyte antigen (RLA) " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

Met Gly Ser lie Pro Pro Arg Thr Leu Leu Leu Leu Leu Ala Gly Ala 
15 10 15 

Leu Thr Leu Lys Asp Thr Gin Ala Gly Ser His Ser Met Arg Tyr Phe 
20 25 30 

Tyr Thr Ser Val Ser Arg Pro Gly Leu Gly Glu Pro Arg Phe lie lie 
35 40 45 

Val Gly Tyr Val Asp Asp Thr Gin Phe Val Arg Phe Asp Ser Asp Ala 
50 55 60 

Ala Ser Pro Arg Met Glu Gin Arg Ala Pro Trp Met Gly Gin Val Glu 
65 70 75 80 

Pro Glu Tyr Trp Asp Gin Gin Thr Gin lie Ala Lys Asp Thr Ala Gin 
85 90 95 



60 
120 
180 
240 
300 
360 
420 
480 
517 
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Thr Phe Arg Val Asn Leu Asn Thr Ala Leu Arg Tyr Tyr Asn Gin Ser 
100 105 110 

Ala Ala Gly Ser His Thr Phe Gin Thr Met Phe Gly Cys Glu Val Trp 
115 120 125 

Ala Asp Gly Arg Phe Phe His Gly Tyr Arg Gin Tyr Ala Tyr Asp Gly 
130 135 140 

Ala Asp Tyr lie Ala Leu Asn Glu Asp Leu Arg Ser Trp Thr Ala Ala 
145 150 155 160 

Asp Thr Ala Ala Gin Asn Thr Gin Arg Lys Trp Glu Ala Ala Gly Glu 
165 170 175 

Ala Glu Arg His Arg Ala Tyr Leu Glu Arg Glu Cys Val Glu Trp Leu 
180 185 190 

Arg Arg Tyr Leu Glu Met Gly Lys Glu Thr Leu Gin Arg Ala Asp Pro 
195 200 205 

Pro Lys Ala His Val Thr His His Pro Ala Ser Asp Arg Glu Ala Thr 
210 215 220 

Leu Arg Cys Trp Ala Leu Gly Phe Tyr Pro Ala Glu He Ser Leu Thr 
225 230 235 240 

Trp Gin Arg Asp Gly Glu Asp Gin Thr Gin Asp Thr Glu Leu Val Glu 
245 250 255 

Thr Arg Pro Gly Gly Asp Gly Thr Phe Gin Lys Trp Ala Ala Val Val 
260 265 270 

Val Pro Ser Gly Glu Glu Gin Arg Tyr Thr Cys Arg Val Gin His Glu 
275 280 285 

Gly Leu Pro Glu Pro Leu Thr Leu Thr Trp Glu Pro Pro Ala Gin Pro 
290 295 300 

Thr Ala Leu He Val Gly He Val Ala Gly Val Leu Gly Val Leu Leu 
305 310 315 320 

He Leu Gly Ala Val Val Ala Val Val Arg Arg Lys Lys His Ser Ser 
325 330 335 

Asp Gly Lys Gly Gly Arg Tyr Thr Pro Ala Ala Gly Gly His Arg Asp 
340 345 350 

Gin Gly Ser Asp Asp Ser Leu Met Pro 
355 360 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 365 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 
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(ix) FEATURE: 

(A) NAME/KEY: Protein 

(B) LOCATION: 1. .365 

(D) OTHER INFORMATION: /note= "Human Major Histocompatability 

Class I (MHO) protein" 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

Met Ala Val Met Ala Pro Arg Thr Leu Val Leu Leu Leu Ser Gly Ala 
15 10 15 

Leu Ala Leu Thr Gin Thr Trp Ala Gly Ser His Ser Met Arg Tyr Phe 
20 25 30 

Phe Thr Ser Val Ser Arg Pro Gly Arg Gly Glu Pro Arg Phe lie Ala 
35 40 45 

Val Gly Tyr Val Asp Asp Thr Gin Phe Val Arg Phe Asp Ser Asp Ala 
50 55 60 

Ala Ser Gin Arg Met Glu Pro Arg Ala Pro Trp lie Glu Gin Glu Gly 
65 70 75 80 

Pro Glu Tyr Trp Asp Gly Glu Thr Arg Lys Val Lys Ala His Ser Gin 
85 90 95 

Thr His Arg Val Asp Leu Gly Thr Leu Arg Gly Tyr Tyr Asn Gin Ser 
100 105 110 

Glu Ala Gly Ser His Thr Leu Gin Met Met Phe Gly Cys Asp Val Gly 
115 120 125 

Ser Asp Trp Arg Phe Leu Arg Gly Tyr His Gin Tyr Ala Tyr Asp Gly 
130 135 140 

Lys Asp Tyr He Ala Leu Lys Glu Asp Leu Arg Ser Trp Thr Ala Ala 
145 150 155 160 

Asp Met Ala Ala Gin Thr Thr Lys His Lys Trp Glu Ala Ala His Val 
165 170 175 

Ala Glu Gin Leu Arg Ala Tyr Leu Glu Gly Thr Cys Val Glu Trp Leu 
180 185 190 

Arg Arg Tyr Leu Glu Asn Gly Lys Glu Thr Leu Gin Arg Thr Asp Ala 
195 200 . 205 

Pro Lys Thr His Met Thr His His Ala Val Ser Asp His Glu Ala Thr 
210 215 220 

Leu Arg Cys Trp Ala Leu Ser Phe Tyr Pro Ala Glu He Thr Leu Thr 
225 230 235 240 

Trp Gin Arg Asp Gly Glu Asp Gin Thr Gin Asp Thr Glu Leu Val Glu 
245 250 255 

Thr Arg Pro Ala Gly Asp Gly Thr Phe Gin Lys Trp Ala Ala Val Val 
260 265 270 

Val Pro Ser Gly Gin Glu Gin Arg Tyr Thr Cys His Val Gin His Glu 
275 280 285 
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Gly Leu Pro Lys Pro Leu Thr Leu 
290 295 

Thr lie Pro He Val Gly He He 
305 310 

Val He Thr Gly Ala Val Val Ala 
325 

Ser Asp Arg Lys Gly Gly Ser Tyr 
340 

Ala Gin Gly Ser Asp Val Ser Leu 
355 360 



Arg Trp Glu Pro Ser Ser Gin Pro 
300 

Ala Gly Leu Val Leu Phe Gly Ala 
315 320 

Ala Val Met Trp Arg Arg Lys Ser 
330 335 

Ser Gin Ala Ala Ser Ser Asp Ser 
345 350 

Thr Ala Cys Lys Val 
365 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
ACATGGTTAA GGCCTGTTGC 20 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
GCCACATCTG GCTTGAAATT 20 



(2) INFORMATION FOR SEQ ID NO; 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE; nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 1 
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(D) OTHER INFORMATION: /mod_base= OTHER 

/note- "N = 5 ' -biotinylated adenine 
(bio-A) " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
NGCTGTTCGT GTTCTATGAT C 21 



(2) INFORMATION FOR SEQ ID NO: 27; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: tnodif ied_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base= OTHER 

/note= "N = 5' -biotinylated adenine 
(bio-A) " 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
NGCTGTTCGT GTTCTATGAT G 21 



(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base= OTHER 

/note= "N = 5 ' -phosphorylated adenine 
(p-A) " 

(ix) FEATURE: 

(A) NAME/KEY: modif ied_base 

(B) LOCATION: 19 

(D) OTHER INFORMATION: /mod_base= OTHER 

/note= "N s= 3' -digoxigenin-conjugated 
adenine (A-dig) " 



(xi) 



SEQUENCE DESCRIPTION: SEQ ID NO: 28: 



NTGAGAGTCG CCGTGTGGN 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
GGAAGAGCAG AGATATACGT GCCAGGTGGA GCACCCAGG 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

( i i ) MOLECULE TYPE : DNA ( genomi c ) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
GGAAGAGCAG AGATATACGT ACCAGGTGGA GCACCCAGG 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQXJENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
CAAAAGAAGC GGAGATTTAA CG 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
AGATTTAACG GGGACGTGC 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
AGAGGTCACA TGATGTGTCA CC 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 
AGGAGGCACT TGTTGGTCC 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 
AAAATCACAA CCACAGCAAA G 



(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 
TTCCCACAGT GAGTCTGCAG 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37 
CAATGGGGAT GGGACCTAC 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38 
ATATACGTGC CAGGTGGAGC 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39 
CCTCTTCACA ACCCCTTTCA 



(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CATAGCTGTG CAACTCACAT CA 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
AGCTGTTCGT GTTCTATGAT CATGAGAGTC GCCGTGTGGA 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42 
AGCTGTTCGT GTTCTATGAT GATGAGAGTC GCCGTGTGGA 



(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQtJENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECtJLE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43 
TGTTCTATGA TCATGAGAGT CGCCGTGTGG AG 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQtJENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 



i 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44 
TGTTCTATGA TCATGAGTGT CGCCGTGTGG AG 



